from:"Steve Ellcey"

RFC: lra-constraints.c and TARGET_HARD_REGNO_CALL_PART_CLOBBERED question/patch

2018-07-11 Thread Steve Ellcey

I have a reload/register allocation question and possible patch.  While
working on the Aarch64 SIMD ABI[1] I ran into a problem where GCC was
saving and restoring registers that it did not need to.  I tracked it
down to lra-constraints.c and its use of
targetm.hard_regno_call_part_clobbered on instructions that are not
calls.  Specifically need_for_call_save_p would check this macro even
when the instruction in question (unknown to need_for_call_save_p)
was not a call instruction.

This seems wrong to me and I was wondering if anyone more familiar
with the register allocator and reload could look at this patch and
tell me if it seems reasonable or not.  It passed bootstrap and I
am running tests now.  I am just wondering if there is any reason why
this target function would need to be called on non-call instructions
or if doing so is just an oversight/bug.

Steve Ellcey
sell...@cavium.com


[1] https://gcc.gnu.org/ml/gcc/2018-07/msg00012.html


2018-07-11  Steve Ellcey  

* lra-constraints.c (need_for_call_save_p): Add insn argument
and only check targetm.hard_regno_call_part_clobbered on calls.
(need_for_split_p): Add insn argument, pass to need_for_call_save_p.
(split_reg): Pass insn to need_for_call_save_p.
(split_if_necessary): Pass curr_insn to need_for_split_p.
(inherit_in_ebb): Ditto.


diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 7eeec76..7fc8e7f 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5344,7 +5344,7 @@ inherit_reload_reg (bool def_p, int original_regno,
 /* Return true if we need a caller save/restore for pseudo REGNO which
was assigned to a hard register.  */
 static inline bool
-need_for_call_save_p (int regno)
+need_for_call_save_p (int regno, rtx_insn *insn)
 {
   lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
   return (usage_insns[regno].calls_num < calls_num
@@ -5354,7 +5354,7 @@ need_for_call_save_p (int regno)
       ? lra_reg_info[regno].actual_call_used_reg_set
       : call_used_reg_set,
       PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
-     || (targetm.hard_regno_call_part_clobbered
+     || (CALL_P (insn) && targetm.hard_regno_call_part_clobbered
      (reg_renumber[regno], PSEUDO_REGNO_MODE (regno);
 }
 
@@ -5374,7 +5374,8 @@ static bitmap_head ebb_global_regs;
assignment pass because of too many generated moves which will be
probably removed in the undo pass.  */
 static inline bool
-need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno)
+need_for_split_p (HARD_REG_SET potential_reload_hard_regs,
+     int regno, rtx_insn *insn)
 {
   int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
 
@@ -5416,7 +5417,8 @@ need_for_split_p (HARD_REG_SET 
potential_reload_hard_regs, int regno)
       || (regno >= FIRST_PSEUDO_REGISTER
       && lra_reg_info[regno].nrefs > 3
       && bitmap_bit_p (&ebb_global_regs, regno
-     || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno)));
+     || (regno >= FIRST_PSEUDO_REGISTER
+     && need_for_call_save_p (regno, insn)));
 }
 
 /* Return class for the split pseudo created from original pseudo with
@@ -5536,7 +5538,7 @@ split_reg (bool before_p, int original_regno, rtx_insn 
*insn,
   nregs = hard_regno_nregs (hard_regno, mode);
   rclass = lra_get_allocno_class (original_regno);
   original_reg = regno_reg_rtx[original_regno];
-  call_save_p = need_for_call_save_p (original_regno);
+  call_save_p = need_for_call_save_p (original_regno, insn);
 }
   lra_assert (hard_regno >= 0);
   if (lra_dump_file != NULL)
@@ -5759,7 +5761,7 @@ split_if_necessary (int regno, machine_mode mode,
     && INSN_UID (next_usage_insns) < max_uid)
    || (GET_CODE (next_usage_insns) == INSN_LIST
    && (INSN_UID (XEXP (next_usage_insns, 0)) < max_uid)))
-   && need_for_split_p (potential_reload_hard_regs, regno + i)
+   && need_for_split_p (potential_reload_hard_regs, regno + i, insn)
    && split_reg (before_p, regno + i, insn, next_usage_insns, NULL))
 res = true;
   return res;
@@ -6529,7 +6531,8 @@ inherit_in_ebb (rtx_insn *head, rtx_insn *tail)
      && usage_insns[j].check == curr_usage_insns_check
      && (next_usage_insns = usage_insns[j].insns) != NULL_RTX)
    {
-     if (need_for_split_p (potential_reload_hard_regs, j))
+     if (need_for_split_p (potential_reload_hard_regs, j,
+   curr_insn))
    {
      if (lra_dump_file != NULL && head_p)
    {

Re: RFC: lra-constraints.c and TARGET_HARD_REGNO_CALL_PART_CLOBBERED question/patch

2018-07-12 Thread Steve Ellcey

On Thu, 2018-07-12 at 07:17 +0100, Richard Sandiford wrote:
> 
> So it only calls targetm.hard_regno_call_part_clobbered if such a
> call is known to exist somewhere between the two references to
> regno (although we don't have the calls themselves to hand).
> 
> Thanks,
> Richard

Having the specific calls is the problem here because the normal ABI
and the SIMD ABI are going to have different values for caller_save
and part_clobbered.  But I think I have found a better way to address
this.

Looking at 'need_for_call_save_p', when 'flag_ipa_ra' is set, we look
at 'actual_call_used_reg_set' to see if we need to save a
register.  But even if a particular register doesn't show up as used,
if 'hard_regno_call_part_clobbered' is set for that register we are
going to return true and say we need a save/restore of the
register.  On Aarch64, if I 'really' didn't use the register I
shouldn't need to save/restore it.  If I used it but it is callee saved
then on 'normal' functions I may need to save/restore it (to protect
the upper bits) but on SIMD functions I do not need to save/restore it
at all because the callee will save/restore the entire register and not
just the lower 64 bits.

I think that the patch I want to create is: when 'flag_ipa_ra' is true,
remove the check of 'hard_regno_call_part_clobbered' from
'need_for_call_save_p'.  Instead, check
'hard_regno_call_part_clobbered' in 'process_bb_lives' (where we know
exactly what function is being called) and use that to set
'actual_call_used_reg_set'.  In process_bb_live we know exactly what
function we are calling and can check to see if it is a 'normal'
function or a SIMD function.

Steve Ellcey
sell...@cavium.com

RFC: Patch to implement Aarch64 SIMD ABI

2018-07-18 Thread Steve Ellcey

This is a patch to support the Aarch64 SIMD ABI [1] in GCC.  I intend
to eventually follow this up with two more patches; one to define the
TARGET_SIMD_CLONE* macros and one to improve the GCC register
allocation/usage when calling SIMD functions.

The significant difference between the standard ARM ABI and the SIMD ABI
is that in the normal ABI a callee saves only the lower 64 bits of registers
V8-V15, in the SIMD ABI the callee must save all 128 bits of registers
V8-V23.

This patch checks for SIMD functions and saves the extra registers when
needed.  It does not change the caller behavour, so with just this patch
there may be values saved by both the caller and callee.  This is not
efficient, but it is correct code.

This patch bootstraps and passes the GCC testsuite but that only verifies
I haven't broken anything, it doesn't validate the handling of SIMD functions.
I tried to write some tests, but I could never get GCC to generate code
that would save the FP callee-save registers in the prologue.  Complex code
might generate spills and fills but it never triggered the prologue/epilogue
code to save V8-V23.  If anyone has ideas on how to write a test that would
cause GCC to generate this code I would appreciate some ideas.  Just doing
lots of calculations with lots of intermediate values doesn't seem to be enough.

Steve Ellcey
sell...@cavium.com

[1] 
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi


2018-07-18  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_function_p): New function.
(aarch64_layout_frame): Check for simd function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (FP_SIMD_SAVED_REGNUM_P): New define.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1369704..b25da11 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1026,6 +1026,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1404,6 +1413,18 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_function_p (tree fndecl)
+{
+  if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL)
+return true;
+  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL)
+return false;
+  return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl;
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
@@ -4034,6 +4055,7 @@ aarch64_layout_frame (void)
 {
   HOST_WIDE_INT offset = 0;
   int regno, last_fp_reg = INVALID_REGNUM;
+  bool simd_function = aarch64_simd_function_p (cfun->decl);
 
   if (reload_completed && cfun->machine->frame.laid_out)
 return;
@@ -4068,7 +4090,8 @@ aarch64_layout_frame (void)
 
   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
 if (df_regs_ever_live_p (regno)
-	&& !call_used_regs[regno])
+	&& (!call_used_regs[regno]
+	|| (simd_function && FP_SIMD_SAVED_REGNUM_P (regno
   {
 	cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
 	last_fp_reg = regno;
@@ -4105,7 +4128,8 @@ aarch64_layout_frame (void)
   {
 	/* If there is an alignment gap between integer and fp callee-saves,
 	   allocate the last fp register to it if possible.  */
-	if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0)
+	if (regno == last_fp_reg && has_align_gap
+	&& !simd_function && (offset & 8) == 0)
 	  {
 	cfun->machine->frame.reg_offset[regno] = max_int_offset;
 	break;
@@ -4117,7 +4141,7 @@ aarch64_layout_frame (void)
 	else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM
 		 && cfun->machine->frame.wb_candidate1 >= V0_REGNUM)
 	  cfun->mach

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-19 Thread Steve Ellcey

On Thu, 2018-07-19 at 08:31 +0100, Richard Sandiford wrote:
> 
> > @@ -4706,8 +4730,11 @@ aarch64_process_components (sbitmap
> > components, bool prologue_p)
> >    while (regno != last_regno)
> >  {
> >    /* AAPCS64 section 5.1.2 requires only the bottom 64 bits to be saved
> > -  so DFmode for the vector registers is enough.  */
> > -  machine_mode mode = GP_REGNUM_P (regno) ? E_DImode : E_DFmode;
> > +  so DFmode for the vector registers is enough.  For simd functions
> > + we want to save the entire register.  */
> > +  machine_mode mode = GP_REGNUM_P (regno) ? E_DImode
> > + : (aarch64_simd_function_p (cfun->decl) ? E_TFmode : E_DFmode);
> This condition also occurs in aarch64_push_regs and aarch64_pop_regs.
> It'd probably be worth splitting it out into a subfunction.
> 
> I think you also need to handle the writeback cases, which should work
> for Q registers too.  This will mean extra loadwb_pair and storewb_pair
> patterns.
> 
> LGTM otherwise FWIW.

Yes, I see where I missed this in aarch64_push_regs
and aarch64_pop_regs.  I think that is why the second of
Wilco's two examples (f2) is wrong.  I am unclear about
exactly what is meant by writeback and why we have it and
how that and callee_adjust are used.  Any chance someone
could help me understand this part of the prologue/epilogue
code better?  The comments in aarch64.c/aarch64.h aren't
really helping me understand what the code is doing or
why it is doing it.

Steve Ellcey
sell...@cavium.com

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-20 Thread Steve Ellcey

On Fri, 2018-07-20 at 11:11 +, Wilco Dijkstra wrote:

> Steve Ellcey wrote:
> 
> > Yes, I see where I missed this in aarch64_push_regs
> > and aarch64_pop_regs.  I think that is why the second of
> > Wilco's two examples (f2) is wrong.  I am unclear about
> > exactly what is meant by writeback and why we have it and
> > how that and callee_adjust are used.  Any chance someone
> > could help me understand this part of the prologue/epilogue
> > code better?  The comments in aarch64.c/aarch64.h aren't
> > really helping me understand what the code is doing or
> > why it is doing it.

> Writeback is the same as a base update in a load or store. When
> creating the frame there are 3 stack adjustments to be made:
> creating stack for locals, pushing callee-saves and reserving space
> for outgoing arguments. We merge these stack adjustments as much as
> possible and use load/store with writeback for codesize and performance.
> See the last part in layout_frame for the different cases.

OK, I think I understand this a bit better now.  I think my main
problem is with the  term 'writeback' which I am not used to seeing.
But if I understand things correctly we are saving one or two registers
and (possibly) updating the stack pointer using auto-increment/auto-
decrement in one instruction and that the updating of SP is what you
mean by 'writeback'.  Correct?

Steve Ellcey
sell...@cavium.com

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-23 Thread Steve Ellcey

Here is an updated version of my patch for the Aarch64 SIMD ABI.  I
think the writeback register saves are correct now and I improved the
register allocation by defining REG_ALLOC_ORDER.  I also added clobbers
to expand_call when calling a non-SIMD function from a SIMD function.

I am still testing but any feedback on what I have so far would be
helpful.

Steve Ellcey
sell...@cavium.com

2018-07-23  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_components_for_bb): Check for simd function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (REG_ALLOC_ORDER): New define.
(HONOR_REG_ALLOC_ORDER): Ditto.
(FP_SIMD_SAVED_REGNUM_P): Ditto.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(loadwb_pair_): New instruction.
("storewb_pair_): Ditto.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1369704..d7557e2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1026,6 +1026,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1404,6 +1413,26 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_decl_p (tree fndecl)
+{
+  if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL)
+return true;
+  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL)
+return false;
+  return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl;
+}
+
+static
+machine_mode aarch64_reg_save_mode (tree fndecl, unsigned regno)
+{
+  return GP_REGNUM_P (regno)
+	   ? E_DImode
+	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
@@ -1498,6 +1527,13 @@ aarch64_is_noplt_call_p (rtx sym)
   return false;
 }
 
+static bool
+aarch64_is_simd_call_p (rtx sym)
+{
+  tree decl = SYMBOL_REF_DECL (sym);
+  return  decl && aarch64_simd_decl_p (decl);
+}
+
 /* Return true if the offsets to a zero/sign-extract operation
represent an expression that matches an extend operation.  The
operands represent the paramters from
@@ -4034,6 +4070,7 @@ aarch64_layout_frame (void)
 {
   HOST_WIDE_INT offset = 0;
   int regno, last_fp_reg = INVALID_REGNUM;
+  bool simd_function = aarch64_simd_decl_p (cfun->decl);
 
   if (reload_completed && cfun->machine->frame.laid_out)
 return;
@@ -4068,7 +4105,8 @@ aarch64_layout_frame (void)
 
   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
 if (df_regs_ever_live_p (regno)
-	&& !call_used_regs[regno])
+	&& (!call_used_regs[regno]
+	|| (simd_function && FP_SIMD_SAVED_REGNUM_P (regno
   {
 	cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
 	last_fp_reg = regno;
@@ -4105,7 +4143,8 @@ aarch64_layout_frame (void)
   {
 	/* If there is an alignment gap between integer and fp callee-saves,
 	   allocate the last fp register to it if possible.  */
-	if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0)
+	if (regno == last_fp_reg && has_align_gap
+	&& !simd_function && (offset & 8) == 0)
 	  {
 	cfun->machine->frame.reg_offset[regno] = max_int_offset;
 	break;
@@ -4117,7 +4156,7 @@ aarch64_layout_frame (void)
 	else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM
 		 && cfun->machine->

[Patch] [Aarch64] PR 86538 - Define __ARM_FEATURE_LSE if LSE is available

2018-07-24 Thread Steve Ellcey

This is a patch for PR 86538, to define an __ARM_FEATURE_LSE macro
when LSE is available.  Richard Earnshaw closed PR 86538 as WONTFIX
because the ACLE (Arm C Language Extension) does not require this
macro and because he is concerned that it might encourage people to
use inline assembly instead of the __sync and atomic intrinsics.
(See actual comments in the defect report.)

While I agree that we want people to use the intrinsics I still think
there are use cases where people may want to know if LSE is available
or not and there is currrently no (simple) way to determine if this feature
is available since it can be turned or and off independently of the
architecture used.  Also, as a general principle, I  think any feature
that can be toggled on or off by the compiler should provide a way for
users to determine what its state is.

So what do other ARM maintainers and users think?  Is this a useful
feature to have in GCC?

Steve Ellcey
sell...@cavium.com


2018-07-24  Steve Ellcey  

PR target/86538
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Add define of __ARM_FEATURE_LSE.


diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index 40c738c..e057ba9 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -154,6 +154,9 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_SM4, "__ARM_FEATURE_SM4", pfile);
   aarch64_def_or_undef (TARGET_F16FML, "__ARM_FEATURE_FP16_FML", pfile);
 
+  /* This is not required by ACLE, but it is useful.  */
+  aarch64_def_or_undef (TARGET_LSE, "__ARM_FEATURE_LSE", pfile);
+
   /* Not for ACLE, but required to keep "float.h" correct if we switch
  target between implementations that do or do not support ARMv8.2-A
  16-bit floating-point extensions.  */

Re: [Patch] [Aarch64] PR 86538 - Define __ARM_FEATURE_LSE if LSE is available

2018-07-24 Thread Steve Ellcey

On Tue, 2018-07-24 at 22:04 +0100, James Greenhalgh wrote:
> 
> 
> I'd say this patch isn't desirable for trunk. I'd be interested in use cases
> that need a static decision on presence of LSE that are not better expressed
> using higher level language features.
> 
> Thanks,
> James

How about when building the higher level features?  Right now,
in sysdeps/aarch64/atomic-machine.h, we
hardcode ATOMIC_EXCHANGE_USES_CAS to 0.  If we had __ARM_FEATURE_LSE we
could use that to determine if we wanted to set
ATOMIC_EXCHANGE_USES_CAS to 0 or 1 which would affect the call
generated in nptl/pthread_spin_lock.c.  That would be useful if we
built a lipthread specifically for a platform that had LSE.

Steve Ellcey
sell...@cavium.com

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-25 Thread Steve Ellcey

Here is version 3 of my patch to implement the SIMD ABI on Aarch64.
I am having a problem with how to handle a SIMD function calling a
non-SIMD function.  When this happens the SIMD function needs to save
V8 to V23 because it cannot count on the non-SIMD function to save
all 128 bits of these registers.

I thought I had this working in the last patch but as I write test
cases, it appears that it is not working and I am not sure how to
implement it.  I tried adding clobbers in aarch64_expand_call but
that is not working (see code in this patch in aarch64_expand_call).
If I add them to 'call' which is a parallel insn, they are ignored.
If I find the underlying call instruction that is part of the parallel
then the clobbers get added to the instruction but then the call itself
is not recognized with the extra clobbers in place.  I don't think we
want to add new call instructions in aarch64.md to handle the vector
register saves and restores.  Am I trying to add the clobbers in the
wrong place?  Where and when should extra clobbers be added to a call
that is going to clobber more registers than what is indicated by
CALL_USED_REGISTERS?

I suppose I could use TARGET_HARD_REGNO_CALL_PART_CLOBBERED but I would
have to extend it to include the call instruction as an argument so the
the code could determine if the call being made was to a simd or non-simd
function.

Steve Ellcey
sell...@cavium.com


2018-07-25  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_components_for_bb): Check for simd function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (REG_ALLOC_ORDER): New define.
(HONOR_REG_ALLOC_ORDER): Ditto.
(FP_SIMD_SAVED_REGNUM_P): Ditto.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(loadwb_pair_): New instruction.
("storewb_pair_): Ditto.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..cc642f5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1027,6 +1027,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1405,6 +1414,26 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_decl_p (tree fndecl)
+{
+  if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL)
+return true;
+  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL)
+return false;
+  return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl;
+}
+
+static
+machine_mode aarch64_reg_save_mode (tree fndecl, unsigned regno)
+{
+  return GP_REGNUM_P (regno)
+	   ? E_DImode
+	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
@@ -1499,6 +1528,13 @@ aarch64_is_noplt_call_p (rtx sym)
   return false;
 }
 
+static bool
+aarch64_is_simd_call_p (rtx sym)
+{
+  tree decl = SYMBOL_REF_DECL (sym);
+  return  decl && aarch64_simd_decl_p (decl);
+}
+
 /* Return true if the offsets to a zero/sign-extract operation
represent an expression that matches an extend operation.  The
operands represent the paramters from
@@ -3269,10 +3305,11 @@ aarch64_split_sve_subreg_move (rtx dest, rtx ptrue, rtx src)
 }
 
 static bool
-aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
- tree exp ATTRIBUTE_

[Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-07-31 Thread Steve Ellcey

Here is a new version of my patch to support the Aarch64 SIMD ABI [1]
in GCC.  I think this is complete enought to be considered for check
in.  I wrote a few new tests and put them in a new gcc.target/torture
directory so they would be run with multiple optimization options.  I
also verified that there are no regressions in the GCC testsuite.

The significant difference between the standard ARM ABI and the SIMD
ABI is that in the normal ABI a callee saves only the lower 64 bits of 
registers V8-V15, in the SIMD ABI the callee must save all 128 bits of
registers V8-V23.

As I mentioned in my RFC, I intend to (eventually) follow this patch
with two more, one to define the TARGET_SIMD_CLONE* macros and one to
improve the GCC register allocation/usage when calling SIMD
functions.  Right now, a caller calling a SIMD function will save more
registers than it needs to because some of those registers will also be
saved by the callee.

Steve Ellcey
sell...@cavium.com

[1] https://developer.arm.com/products/software-development-tools/hpc/a
rm-compiler-for-hpc/vector-function-abi

Compiler ChangeLog:

2018-07-31  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.

Testsuite ChangeLog:

2018-07-31  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index af5db9c..99c962f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
@@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx);
 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);
 
+extern int aarch64_epilogue_uses (int);
+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
    rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..9e6827a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1027,6 +1027,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1405,6 +1414,26 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is

[patch, libtool] Top-level libtool.m4 patch for autoconf 2.68

2013-08-20 Thread Steve Ellcey


Some of the binutils directories (libgloss and newlib) have moved from
autoconf 2.64 to 2.68.  While running autoconf 2.68 on these directories
I get some warnings coming from libtool.m4 which is at the top level
of the GCC and binutils trees.  Fixes for these warnings are already in
the upstream libtool sources and the changes work fine with autoconf 2.64
as well as autoconf 2.68 so I would like to put these fixes into the
libtool.m4 at the GCC and binutils top-levels.

I ran autoconf in various GCC and binutils directories to verify that no
signficant changes ocurred in any directories, is this patch OK to check
in to the GCC tree?  I will send seperate email to the binutils mailing
list though perhaps permission to check it in to the GCC tree is sufficient
to put it in binutils as well?

Steve Ellcey
sell...@mips.com


2013-08-20  Steve Ellcey  

* libtool.m4 (_LT_SYS_MODULE_PATH_AIX): Put AC_LANG_PROGRAM
call in brackets.
(irix*|nonstopux*): use AC_LANG_SOURCE in AC_LINK_IFELSE call.


diff --git a/libtool.m4 b/libtool.m4
index 8a14e2b..7119b35 100644
--- a/libtool.m4
+++ b/libtool.m4
@@ -1079,7 +1079,7 @@ m4_defun([_LT_DARWIN_LINKER_FEATURES],
 # to the aix ld manual.
 m4_defun([_LT_SYS_MODULE_PATH_AIX],
 [m4_require([_LT_DECL_SED])dnl
-AC_LINK_IFELSE(AC_LANG_PROGRAM,[
+AC_LINK_IFELSE([AC_LANG_PROGRAM],[
 lt_aix_libpath_sed='
 /Import File Strings/,/^$/ {
/^0/ {
@@ -4926,7 +4926,16 @@ _LT_EOF
# implicitly export all symbols.
 save_LDFLAGS="$LDFLAGS"
 LDFLAGS="$LDFLAGS -shared ${wl}-exported_symbol ${wl}foo 
${wl}-update_registry ${wl}/dev/null"
-AC_LINK_IFELSE(int foo(void) {},
+   AC_LINK_IFELSE(
+ [AC_LANG_SOURCE(
+[AC_LANG_CASE([C], [[int foo (void) { return 0; }]],
+  [C++], [[int foo (void) { return 0; }]],
+  [Fortran 77], [[
+  subroutine foo
+  end]],
+  [Fortran], [[
+  subroutine foo
+  end]])])],
   _LT_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs 
$compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && 
func_echo_all "${wl}-set_version ${wl}$verstring"` ${wl}-update_registry 
${wl}${output_objdir}/so_locations ${wl}-exports_file ${wl}$export_symbols -o 
$lib'
 )
 LDFLAGS="$save_LDFLAGS"

Re: [PATCH] Fix PR57451 (Incorrect debug ranges emitted for -freorder-blocks-and-partition -g)

2013-08-22 Thread Steve Ellcey

On Mon, 2013-08-19 at 22:21 -0700, Teresa Johnson wrote:

> 2013-08-19  Teresa Johnson  
> 
> PR rtl-optimizations/57451
> * final.c (reemit_insn_block_notes): Prevent lexical blocks
> from crossing split section boundaries.
> 
> * testsuite/g++.dg/tree-prof/pr57451.C: New test.

Teresa,

This patch is causing a problem in my mips compiler.  I am creating a
cross-toolchain running on x86 targeting mips-mti-linux-gnu and if I
compile glibc using a GCC that has this patch all of the programs I
compile go into an infinite loop when run under the qemu simulator.  I
think it is the dynamic linker that is getting miscompiled, but I
haven't tracked down the exact nature of the miscompilation yet.  Has
anyone else reported any problems with it?  I am adding Richard
Sandiford to the email list since he is a mips expert and also might
have some insight to using next_active_insn vs. next_real_insn vs.
next_insn, though it doesn't look like you have changed what is used
here.

Steve Ellcey
sell...@mips.com

Re: [PATCH] Fix PR57451 (Incorrect debug ranges emitted for -freorder-blocks-and-partition -g)

2013-08-23 Thread Steve Ellcey

On Fri, 2013-08-23 at 07:38 -0700, Teresa Johnson wrote:

> > It should be committed soon, by Kaz or myself once testing finishes.
> > Please let me know if that doesn't fix your failures.
> 
> Fix committed as r201941. Let me know if you still have issues.
> 
> Teresa

This patch fixes the problem for me.  Thanks Teresa, Thanks Kaz.

Steve Ellcey

Re: [patch, libgfortran, configure] Cross-compile support for libgfortran

2013-09-23 Thread Steve Ellcey

On Mon, 2013-09-23 at 16:26 +0100, Marcus Shawcroft wrote:
> On 4 June 2013 20:49, Steve Ellcey  wrote:
> > This patch allows me to build libgfortran for a cross-compiling toolchain
> > using newlib.  Currently the checks done by AC_CHECK_FUNCS_ONCE fail with
> > my toolchain because the compile/link fails due to the configure script not
> > using the needed linker script in the link command.  The check for 
> > with_newlib
> > is how libjava deals with the problem and it fixes my build problems.
> >
> > My only concern is defining HAVE_STRTOLD, strtold exists in my newlib but
> > I am not sure if that is true for all newlib builds.  I didn't see any
> > flags that I could easily use to check for long double support in the
> > libgfortran configure.ac, but it seems to assume that the type exists.
> >
> > OK to checkin?
> 
> This patch breaks systems where long double is wider than double.  The
> newlib implementation provides strtold for systems where double is as
> wide as long double but not on systems where long double is wider.
> 
> I don;t see any issues with AC_CHECK_FUNC_ONCE when cross configuring
> aarch64 toolchains but I would have thought that fixing the link test
> issue you encountered would be preferable to hard coding assumptions
> in the configure script?
> 
> Cheers
> /Marcus

AC_CHECK_FUNC_ONCE may work for aarch64 but I don't think there is
anyway to make it work generally for all platforms.  In the libjava
configure.ac file is the comments:

[dnl Botheration.  Now we've got to detect the exception model.
dnl Link tests against libgcc.a are problematic since -- at least
dnl as of this writing -- we've not been given proper -L bits for
dnl single-tree newlib and libgloss.

and:

   # We are being configured with a cross compiler.  AC_REPLACE_FUNCS
   # may not work correctly, because the compiler may not be able to
   # link executables.

The libstdc++ configure has comments to the same effect:

  # This lets us hard-code the functionality we know we'll have in the cross
  # target environment.  "Let" is a sugar-coated word placed on an especially
  # dull and tedious hack, actually.
  #
  # Here's why GLIBCXX_CHECK_MATH_SUPPORT, and other autoconf macros
  # that involve linking, can't be used:
  #"cannot open sim-crt0.o"
  #"cannot open crt0.o"
  # etc.  All this is because there currently exists no unified, consistent
  # way for top level CC information to be passed down to target directories:
  # newlib includes, newlib linking info, libgloss versus newlib crt0.o, etc.
  # When all of that is done, all of this hokey, excessive AC_DEFINE junk for
  # crosses can be removed.

The libstdc++ configure file has something that looks like it might be
intended to address the problem of long double support
(long_double_math_on_this_cpu) but I don't see where that variable is
ever set in any configure file even though it is used in the libstdc++
configure file.

Steve Ellcey
sell...@mips.com

Re: RFA: Store the REG_BR_PROB probability directly as an int

2013-09-25 Thread Steve Ellcey

On Tue, 2013-09-24 at 21:07 +0200, Andreas Schwab wrote:
> Richard Sandiford  writes:
> 
> > Sorry for the breakage.  I think we need to handle INT_LIST in the same way
> > as INSN_LIST though, and eliminate in XEXP (x, 1).
> >
> > How about the attached?  Testing in progress...
> 
> Works for me as well.
> 
> Andreas.

This patch worked for me as well on MIPS.  I did a complete build and
test overnight.

Steve Ellcey
sell...@mips.com

Re: [PATCH] Fix libgfortran cross compile configury w.r.t newlib

2013-09-26 Thread Steve Ellcey

On Thu, 2013-09-26 at 14:47 +0100, Marcus Shawcroft wrote:

> I'm in two minds about whether further sticky tape of this form is the 
> right approach or whether the original patch should be reverted until a 
> proper fix that does not regress the tree can be found.
> 
> Thoughts?
> 
> 2013-09-26  Marcus Shawcroft  
> 
>  * configure.ac (AC_CHECK_FUNCS_ONCE): Make if statement
>  dependent on gcc_no_link.
> 
> Cheers
> /Marcus

I think this patch is a good fix.  I (obviously) don't favor reverting
the previous patch because that would re-break the Fortran build on MIPS
bare-metal cross compilers (or any compiler where a linker script is
needed).  Any 'proper' fix should address libstdc++, libjava, and other
libraries as well as libgfortran and I don't know what a cleaner fix
would be.  In fact I would say the other libraries should consider using
this fix.  The only reason they don't run into this problem too is that
they don't depend on any long double functions or any other functions
that are optionally built by newlib.

I will test this patch on my targets and make sure it works for me, but
I don't see why it would not.

Steve Ellcey
sell...@mips.com

Re: [PATCH] Fix libgfortran cross compile configury w.r.t newlib

2013-09-27 Thread Steve Ellcey

On Thu, 2013-09-26 at 14:47 +0100, Marcus Shawcroft wrote:

> I'm in two minds about whether further sticky tape of this form is the 
> right approach or whether the original patch should be reverted until a 
> proper fix that does not regress the tree can be found.
> 
> Thoughts?
> 
> 2013-09-26  Marcus Shawcroft  
> 
>  * configure.ac (AC_CHECK_FUNCS_ONCE): Make if statement
>  dependent on gcc_no_link.
> 
> Cheers
> /Marcus

Well, I thought this patch would work for me, but it does not.  It looks
like gcc_no_link is set to 'no' on my target because, technically, I can
link even if I don't use a linker script.  I just can't find any
functions.

% cat x.c
int main (void) { return 0; }
% mips-mti-elf-gcc x.c -o x
/local/home/sellcey/nightly/install-mips-mti-elf/lib/gcc/mips-mti-elf/4.9.0/../../../../mips-mti-elf/bin/ld:
 warning: cannot find entry symbol __start; defaulting to 00400098
% echo $?
0

% cat y.c
int main (void) { exit (0); }
% install-mips-mti-elf/bin/mips-mti-elf-gcc y.c -o y
/local/home/sellcey/nightly/install-mips-mti-elf/lib/gcc/mips-mti-elf/4.9.0/../../../../mips-mti-elf/bin/ld:
 warning: cannot find entry symbol __start; defaulting to 00400098
/tmp/ccdG78PN.o: In function `main':
y.c:(.text+0x14): undefined reference to `exit'
collect2: error: ld returned 1 exit status
ubuntu-sellcey % echo $?
1

Steve Ellcey
sell...@mips.com

Re: [PATCH v2] Fix libgfortran cross compile configury w.r.t newlib

2013-10-01 Thread Steve Ellcey

On Tue, 2013-10-01 at 12:40 +0100, Marcus Shawcroft wrote:
> On 30/09/13 13:40, Marcus Shawcroft wrote:
> 
> >> Well, I thought this patch would work for me, but it does not.  It looks
> >> like gcc_no_link is set to 'no' on my target because, technically, I can
> >> link even if I don't use a linker script.  I just can't find any
> >> functions.
> >>
> 
> > In which case gating on gcc_no_link could be replaced with a test that
> > looks to see if we can link with the library.  Perhaps looking for
> > exit() or some such that might reasonably be expected to be present.
> >
> > For example:
> >
> > AC_CHECK_FUNC(exit)
> > if test "x${with_newlib}" = "xyes" -a "x${ac_cv_func_exit}" = "xno"; then
> >
> > /Marcus

This patch works on my mips-mti-elf target.

Steve Ellcey
sell...@mips.com

Re: [patch, mips] Patch for new mips triplet - mips-mti-elf

2012-10-01 Thread Steve Ellcey

On Sun, 2012-09-30 at 19:53 +0100, Richard Sandiford wrote:

> Sorry for only noticing now, but this produced:
> 
> ERROR: gcc.target/mips/pr37362.c  -O0 : syntax error in target selector 
> "target ! mips*-sde-elf mips*-mti-elf" for " dg-do 2 compile { target { ! 
> mips*-sde-elf mips*-mti-elf } } "
> ...
> 
> We need another set of braces.  Tested on mipsisa64-elf and applied.
> 
> Richard

Thanks for fixing this, I am not sure why I didn't notice it in my
testing.

Steve Ellcey
sell...@mips.com

[patch, mips, testsuite] Fix test to handle optimizations

2012-10-08 Thread Steve Ellcey

The gcc.target/mips/ext_ins.c was failing in little endian mode on MIPS because
the compiler is smart enough now to see that 'c' is uninitialized and it can
insert the field 'a' into 'c' with a shift and a full store instead of an
insert because the store just overwrites unintialized data.  I changed the
code to force the compiler to preserve the other fields of 'c' and that makes
it use the insert instruction in both big and little endian modes.

Tested on mips-mti-elf.

OK to checkin?

Steve Ellcey
sell...@mips.com



2012-10-08  Steve Ellcey  

* gcc.target/ext_ins.c: Modify f2 to aviod uninitialized data.


diff --git a/gcc/testsuite/gcc.target/mips/ext_ins.c 
b/gcc/testsuite/gcc.target/mips/ext_ins.c
index f0169bc..36f0f3f 100644
--- a/gcc/testsuite/gcc.target/mips/ext_ins.c
+++ b/gcc/testsuite/gcc.target/mips/ext_ins.c
@@ -18,9 +18,8 @@ NOMIPS16 unsigned int f1 (struct A a)
   return a.j;
 }
 
-NOMIPS16 void f2 (int i)
+NOMIPS16 struct A f2 (struct A a, int i)
 {
-  struct A c;
-  c.j = i;
-  func (c);
+  a.j = i;
+  return a;
 }

[patch, mips, testsuite] Fix gcc.target/mips/octeon-bbit-2.c for -Os

2012-10-08 Thread Steve Ellcey

The gcc.target/octeon-bbit-2.c is failing with -Os because that optimization
level does not do whichever optimization it is that results in a bbit instead
of a bbit[01]l.  I would like to skip this test for -Os the way it already gets
skipped for -O0.

Tested on mips-mti-elf.  Ok for checkin?

Steve Ellcey
sell...@mips.com



2012-10-08  Steve Ellcey  

* gcc.target/octeon-bbit-2.c: Skip for -Os optimization level.


diff --git a/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c 
b/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
index 9bd8dce..7d88d68 100644
--- a/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
+++ b/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-march=octeon -mbranch-likely -fno-unroll-loops" } */
-/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-Os" } { "" } } */
 /* { dg-final { scan-assembler "\tbbit\[01\]\t" } } */
 /* { dg-final { scan-assembler-not "\tbbit\[01\]l\t" } } */
 /* { dg-final { scan-assembler "\tbnel\t" } } */

Re: [patch, mips, testsuite] Fix gcc.target/mips/octeon-bbit-2.c for -Os

2012-10-08 Thread Steve Ellcey

On Mon, 2012-10-08 at 11:09 -0700, Mike Stump wrote:
> On Oct 8, 2012, at 9:21 AM, Steve Ellcey  wrote:
> > The gcc.target/octeon-bbit-2.c is failing with -Os because that optimization
> > level does not do whichever optimization it is that results in a bbit 
> > instead
> > of a bbit[01]l.  I would like to skip this test for -Os the way it already 
> > gets
> > skipped for -O0.
> > 
> > Tested on mips-mti-elf.  Ok for checkin?
> 
> Ideally I'd like a mips expert to weigh in on this.  The issue is, is the 
> code smaller with the other instruction?
> If so, is there a reasonable way to obtain that type of win more often in the 
> port with -Os?  Now, if you are that
> mips expert, that's fine, but, trivially you don't need my approval to check 
> it in.  If the code is larger,
> trivially, the patch is ok.  If the optimization generally hurt code size and 
> can't be made to win, the patch is ok.
> If always the same size, it would seem ok.   I just don't have the mips 
> specific background to know which case this
> is.

Well, I checked -O1, -O2 and -Os.  The -Os code is smaller then -O1 but
larger then -O2.  I didn't dig deep enough to find out exactly which
optimization is causing the change in instruction usage.  Perhaps
Richard Sandiford will have an opinion on this change.

Steve Ellcey
sell...@mips.com

[patch, mips] Modify mips-mti-linux-gnu build

2012-10-30 Thread Steve Ellcey

This patch makes a few changes to the work I have been doing on the
mips-mti-linux-gnu target that I recently added.  It adds an -mabi=64 build
to MULTILIBS and changes the order of MULTILIB_OPTIONS to be consistent with
the mips-mti-elf target.

It also makes the N32 ABI the default when compiling for mips64* targets,
this is also to be consistent with the mips-mti-elf target.

Tested with a cross-build and running the GCC testsuite using qemu.

OK to checkin?

Steve Ellcey
sell...@mips.com



2012-10-30  Steve Ellcey  

* config/mips/mti-linux.h (SYSROOT_SUFFIX_SPEC): Change order
and add mabi=64.
(DRIVER_SELF_SPECS): Make -n32 the default on mips64* archs.
* config/mips/t-mti-linux (MULTILIB_OPTIONS): Change order.
(MULTILIB_DIRNAMES): Ditto.
(MULTILIB_EXCEPTIONS): New.


diff --git a/gcc/config/mips/mti-linux.h b/gcc/config/mips/mti-linux.h
index 36c003c..cda9bdc 100644
--- a/gcc/config/mips/mti-linux.h
+++ b/gcc/config/mips/mti-linux.h
@@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 /* This target is a multilib target, specify the sysroot paths.  */
 #undef SYSROOT_SUFFIX_SPEC
 #define SYSROOT_SUFFIX_SPEC \
-
"%{mips32:/mips32}%{mips64:/mips64}%{mips64r2:/mips64r2}%{msoft-float:/sof}%{mel|EL:/el}%{mabi=64:/64}%{mabi=n32:/n32}"
+
"%{mips32:/mips32}%{mips64:/mips64}%{mips64r2:/mips64r2}%{mabi=64:/64}%{mel|EL:/el}%{msoft-float:/sof}"
 
 #undef DRIVER_SELF_SPECS
 #define DRIVER_SELF_SPECS  \
@@ -36,6 +36,10 @@ along with GCC; see the file COPYING3.  If not see
   /* Infer the -msynci setting from -march if not explicitly set.  */  \
   MIPS_ISA_SYNCI_SPEC, \
\
+  /* If no ABI option is specified, infer one from the ISA level   \
+ or -mgp setting.  */  \
+  "%{!mabi=*: %{" MIPS_32BIT_OPTION_SPEC ": -mabi=32;: -mabi=n32}}",   \
+   \
   /* Base SPECs.  */   \
   BASE_DRIVER_SELF_SPECS   \
\
diff --git a/gcc/config/mips/t-mti-linux b/gcc/config/mips/t-mti-linux
index ba11706..6d280cd 100644
--- a/gcc/config/mips/t-mti-linux
+++ b/gcc/config/mips/t-mti-linux
@@ -19,6 +19,15 @@
 # The default build is mips32r2, hard-float big-endian.  Add mips32,
 # soft-float, and little-endian variations.
 
-MULTILIB_OPTIONS = mips32/mips64/mips64r2 msoft-float EL
-MULTILIB_DIRNAMES = mips32 mips64 mips64r2 sof el
+MULTILIB_OPTIONS = mips32/mips64/mips64r2 mabi=64 EL msoft-float
+MULTILIB_DIRNAMES = mips32 mips64 mips64r2 64 el sof
 MULTILIB_MATCHES = EL=mel EB=meb
+
+# The 64 bit ABI is not supported on the mips32 architecture.
+MULTILIB_EXCEPTIONS += *mips32*/*mabi=64*
+
+# The 64 bit ABI is not supported on the mips32r2 architecture.
+# Because mips32r2 is the default we can't use that flag to trigger
+# the exception so we check for mabi=64 with no specific mips flag
+# instead.
+MULTILIB_EXCEPTIONS += mabi=64*

Re: [PATCH][i386]Fix PR 57756

2013-10-17 Thread Steve Ellcey

On Thu, 2013-10-17 at 06:03 -0700, Diego Novillo wrote:
> On Wed, Oct 16, 2013 at 6:06 PM, David Edelsohn  wrote:
> 
> > How is Google going to change its patch commit policies to ensure that
> > this does not happen again?
> 
> There is nothing to change.  Google follows
> http://gcc.gnu.org/contribute.html, like everyone else. Sri just fixed
> the oversight and if there is any other fallout from his patch, he
> will address it.
> 
> I don't see anything out of the ordinary here.
> 
> 
> Diego.

FYI: I just want to make sure everyone is aware that there are still
broken targets.  rs6000 may be fixed but mips still doesn't work and
I presume other platforms like sparc are also still broken.

/local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c: In
function 'void cpp_atomic_builtins(cpp_reader*)':
/local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:594:7: 
error: 'target_flags_explicit' was not declared in this scope
/local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:606:7: 
error: 'target_flags_explicit' was not declared in this scope
/local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:618:7: 
error: 'target_flags_explicit' was not declared in this scope
/local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:630:7: 
error: 'target_flags_explicit' was not declared in this scope
make[1]: *** [c-family/c-cppbuiltin.o] Error 1

Sriraman, are you looking at how to fix this globally or are the target
maintainers expected to fix this?  Currently I am using this one line
patch to get things building, but I presume this is not what we want
long term.


% git diff opth-gen.awk
diff --git a/gcc/opth-gen.awk b/gcc/opth-gen.awk
index 01c5ab6..46bd570 100644
--- a/gcc/opth-gen.awk
+++ b/gcc/opth-gen.awk
@@ -114,6 +114,7 @@ print "};"
 print "extern struct gcc_options global_options;"
 print "extern const struct gcc_options global_options_init;"
 print "extern struct gcc_options global_options_set;"
+print "#define target_flags_explicit global_options_set.x_target_flag
s"
 print "#endif"
 print "#endif"
 print ""


Steve Ellcey
sell...@mips.com

Re: [PATCH][i386]Fix PR 57756

2013-10-17 Thread Steve Ellcey

On Thu, 2013-10-17 at 11:06 -0700, Xinliang David Li wrote:
> JBG, Steve, Can you help testing Sri's latest patch on your targets?
> This will help speed up the process.
> 
> thanks,
> 
> David

I have already tested the one line patch to opth-gen.awk.  It fixes the
MIPS build and a testsuite run looked OK too.

Steve Ellcey
sell...@mips.com

[patch, mips] Fix optimization bug involving nor instruction

2013-10-22 Thread Steve Ellcey

While looking at some MIPS code I found that GCC was not making optimal
use of the MIPS nor instruction.  If you compile this function:

int f (int a, int b) { return ~(a|b); }

It generates:

or  $2,$4,$5
nor $2,$0,$2

instead of just:

nor $2,$4,$5

The problem is that mips_rtx_costs does not know that (AND (NOT op1) (NOT op2))
can be done in a single operation so it calculates the cost of the expression
as the cost of 3 operations, this results in combine thinking that
(NOT (OR op1 op2) is cheaper.

This patch changes mips_rtx_costs to change the cost calculation of a nor to
be the cost of one operation (plus the cost of getting the operands into
registers) instead of the cost of three operations.

Tested with no regressions, OK to checkin?

Steve Ellcey
sell...@mips.com


2013-10-22  Steve Ellcey  

* config/mips/mips.c (mips_rtx_costs):  Fix cost estimate for nor
(AND (NOT OP1) (NOT OP2)).


2013-10-22  Steve Ellcey  

* gcc.target/mips/nor.c: New.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 5993aab..ffb0b53 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -3796,6 +3796,18 @@ mips_rtx_costs (rtx x, int code, int outer_code, int 
opno ATTRIBUTE_UNUSED,
  return true;
}
}
+  /* (AND (NOT op0) (NOT op1) is a nor operation that can be done in
+a single instruction.  */
+  if (!TARGET_MIPS16 && (GET_CODE (XEXP (x, 0)) == NOT)
+ && (GET_CODE (XEXP (x, 1)) == NOT))
+   {
+ rtx op0 = XEXP (x, 0);
+ rtx op1 = XEXP (x, 1);
+ cost = GET_MODE_SIZE (mode) > UNITS_PER_WORD ? 2 : 1;
+  *total = COSTS_N_INSNS (cost) + set_src_cost (XEXP (op0, 0), speed)
+  + rtx_cost (XEXP (op1, 0), GET_CODE (op1), 1, speed);
+ return true;
+   }

   /* Fall through.  */
 
diff --git a/gcc/testsuite/gcc.target/mips/nor.c 
b/gcc/testsuite/gcc.target/mips/nor.c
new file mode 100644
index 000..e71791b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/nor.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-final { scan-assembler-times "\tnor\t" 1 } } */
+/* { dg-final { scan-assembler-not "\tor" } } */
+
+/* Test that we generate a 'nor' instruction and no 'or' instructions.  */
+
+NOMIPS16 int f (int a, int b)
+{
+   return ~(a|b);
+}

Re: [patch, mips] Fix optimization bug involving nor instruction

2013-10-22 Thread Steve Ellcey

On Tue, 2013-10-22 at 18:34 +0100, Richard Sandiford wrote:
> Good spot!

> We should use set_src_cost for both operands (the idea being that only
> a register is allowed, and that anything else will end up being a SET_SRC).
> I think the formatting should be something like:
> 
>   /* (AND (NOT op0) (NOT op1)) is a NOR operation that can be done in
>a single instruction.  */
>   if (!TARGET_MIPS16
> && GET_CODE (XEXP (x, 0)) == NOT
> && GET_CODE (XEXP (x, 1)) == NOT)
>   {
> cost = GET_MODE_SIZE (mode) > UNITS_PER_WORD ? 2 : 1;
> *total = (COSTS_N_INSNS (cost)
>   + set_src_cost (XEXP (XEXP (x, 0), 0), speed)
>   + set_src_cost (XEXP (XEXP (x, 1), 0), speed));
> return true;
>   }
> 
> OK with that change, thanks.
> 
> Richard

OK, but I am curious why you put parenthesis around the right hand side
of the total expression.  I.e.  *total = ();

Steve Ellcey

Re: [patch, mips] Fix optimization bug involving nor instruction

2013-10-22 Thread Steve Ellcey

On Tue, 2013-10-22 at 19:12 +0100, Richard Sandiford wrote:

> >> Richard
> >
> > OK, but I am curious why you put parenthesis around the right hand side
> > of the total expression.  I.e.  *total = ();
> 
> That's the "emacs formatting" rule:

OK, I have checked in the patch with the parenthesis (and your other
changes).

Steve

Re: [PATCH v2] Fix libgfortran cross compile configury w.r.t newlib

2013-10-24 Thread Steve Ellcey

On Thu, 2013-10-24 at 15:53 +0100, Marcus Shawcroft wrote:

> Steve,
> 
> Can your build be fixed allowing us to back out:
> http://gcc.gnu.org/ml/fortran/2013-06/msg00038.html
> 
> ?
> 
> I'd really like to make some progress on this, while my proposed patch
> does resolve the regression introduced by the above patch I am
> concerned that this is going in the wrong direction and that we
> should, as Mike suggests above fix the build issue such that autoconf
> behaves, rather than attempting to hardwire configure details of
> newlib into libgfortran...
> 
> /Marcus

I am not sure how we would fix the build issue to allow us to not
hardcode the newlib configure details into the libgfortran configure
script.  The linker script that needs to be used to get a good link is
different depending on what options one is compiling with on a multilib
MIPS system and I have no idea on how one could create (or use) a dummy
linker script.  Ideally, I think we would want a check that does not
depend on linking at all.

Note that this problem is not just in libgfortran, it affects libstdc++
and libjava too and those configure scripts also have hardcoded the
newlib assumptions.  The only difference between libgfortran and the
other two libraries is that the newlib assumptions for libgfortran are
not static like they are for libstdc++ and libjava.  They vary (for one
function) based on whether or not long double is supported.

If we want to come up with a better solution it should be used for all
the libraries and not just for libgfortran.

Steve Ellcey
sell...@mips.com

[Patch] Fix canadian cross build on systems with no fenv.h

2013-10-30 Thread Steve Ellcey


I ran into a build problem while doing a canadian cross build of GCC.
I was building on linux to create a Windows (mingw) GCC that generates
code for mips-mti-elf.

The mips-mti-elf target uses newlib for its system headers and libraries
and the headers do not include a fenv.h header file.  However, when doing
a canadian cross build libstdc++ is built with the C++ compiler that runs
on linux (the build system), not the C++ that was just built for Windows
(the host system).  The problem is that the libstdc++ configure script
(in GLIBCXX_CHECK_C99_TR1) is checking for the fenv.h using C++ (not C)
and the C++ compiler running on linux does have an fenv.h header because
the latest libstdc++ builds always create this header for C++ regardless
of whether there is one on the system or not.  This results in the newly
created libstdc++ library having a fenv.h header that tries to include the
system fenv.h header file which does not exist and the build fails.

My fix for this is to explicitly check for fenv.h and complex.h in the
libstdc++ configure.ac script before calling GLIBCXX_CHECK_C99_TR1.
This makes the configure script check for these headers using the C
compiler instead of the C++ compiler and when GLIBCXX_CHECK_C99_TR1
is run it uses that information (saved in a autoconf variable) to 
correctly ascertain that fenv.h does not exist.

I could put these checks in GLIBCXX_CHECK_C99_TR1 if that were considered
preferable, it would just have to be done before we call 'AC_LANG_CPLUSPLUS'.

Tested with both my canadian cross build and a standard cross build
targetting mips-mti-elf.

OK for checkin?

Steve Ellcey
sell...@mips.com


2013-10-30  Steve Ellcey  

* configure.ac: Add header checks for fenv.h and complex.h.
* configure: Regenerate.


diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index dd13b01..22fc840 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -195,6 +195,12 @@ GLIBCXX_CHECK_S_ISREG_OR_S_IFREG
 AC_CHECK_HEADERS(sys/uio.h)
 GLIBCXX_CHECK_WRITEV
 
+# Check for fenv.h and complex.h before GLIBCXX_CHECK_C99_TR1
+# so that the check is done with the C compiler (not C++).
+# Checking with C++ can break a canadian cross build if either
+# file does not exist in C but does in C++.
+AC_CHECK_HEADERS(fenv.h complex.h)
+
 # For C99 support to TR1.
 GLIBCXX_CHECK_C99_TR1

[Patch] Fix type of clock_t in timevar.c

2013-11-01 Thread Steve Ellcey


While working on a canadian cross build I ran into a problem with the
type of clock_t.  If HAVE_CLOCK_T is not defined timevar.c defines
it to be int.  I think the type should be long.  I am using the mingw
compilers on linux in my canadian cross build and HAVE_CLOCK_T is not 
getting set but when building I get a conflict between the definition
of clock_t in /usr/i586-mingw32msvc/include/time.h (long) and the
definition in timevar.c.

Since mingw and linux and glibc and newlib all seem to agree on long
I would like to change timevar.c to agree with them.

Tested with my canadian cross build and a native x86 linux build.

OK to checkin?

Steve Ellcey
sell...@mips.com



2013-11-01  Steve Ellcey  

* timevar.c: Fix type of clock_t.


diff --git a/gcc/timevar.c b/gcc/timevar.c
index 23b7118..b66f94a 100644
--- a/gcc/timevar.c
+++ b/gcc/timevar.c
@@ -23,7 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "timevar.h"
 
 #ifndef HAVE_CLOCK_T
-typedef int clock_t;
+typedef long clock_t;
 #endif
 
 #ifndef HAVE_STRUCT_TMS

[Patch] configure patch for caddr_t and ssize_t types

2013-11-01 Thread Steve Ellcey

While doing a canadian cross build I ran into a problem with the caddr_t
type.  configure.ac is using an obsolete version of AC_CHECK_TYPE to
create #define's of caddr_t and ssize_t if they are not defined by the
system.  In addition to using an obsolete version of AC_CHECK_TYPE this
was causing a problem in my build because caddr_t was also getting set via
a typedef by the mingw compilers.

Here is my fix, tested with a canadian cross build and with a native x86
linux build.

OK to checkin?


2013-11-01  Steve Ellcey  

* configure.ac: Add header checks for fenv.h and complex.h.
* configure: Regenerate.
* config.in: Regnerate.
* system.h: Add default caddr_t and ssize_t typedefs.


diff --git a/gcc/configure.ac b/gcc/configure.ac
index 5e686db..bc87073 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1075,8 +1075,8 @@ int main()
   fi
 fi
 
-AC_CHECK_TYPE(ssize_t, int)
-AC_CHECK_TYPE(caddr_t, char *)
+AC_CHECK_TYPES([ssize_t])
+AC_CHECK_TYPES([caddr_t])
 
 GCC_AC_FUNC_MMAP_BLACKLIST
 
diff --git a/gcc/system.h b/gcc/system.h
index a1fc6de..108ec0e 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -1060,6 +1060,14 @@ helper_const_non_const_cast (const char *p)
 #define DEBUG_VARIABLE
 #endif
 
+#ifndef HAVE_CADDR_T
+typedef char *caddr_t;
+#endif
+
+#ifndef HAVE_SSIZE_T
+typedef int ssize_t
+#endif
+
 /* Get definitions of HOST_WIDE_INT and HOST_WIDEST_INT.  */
 #include "hwint.h"

Re: [Patch] configure patch for caddr_t and ssize_t types

2013-11-01 Thread Steve Ellcey

On Fri, 2013-11-01 at 19:22 +0100, Marek Polacek wrote:
> On Fri, Nov 01, 2013 at 11:15:02AM -0700, Steve Ellcey  wrote:
> > --- a/gcc/system.h
> > +++ b/gcc/system.h
> > @@ -1060,6 +1060,14 @@ helper_const_non_const_cast (const char *p)
> >  #define DEBUG_VARIABLE
> >  #endif
> >  
> > +#ifndef HAVE_CADDR_T
> > +typedef char *caddr_t;
> > +#endif
> > +
> > +#ifndef HAVE_SSIZE_T
> > +typedef int ssize_t
> > +#endif
> 
> Missing ';'?
> 
>   Marek

Yes.  Both my cross build and my native build set HAVE_SSIZE_T so I
didn't see this.  I really only changed ssize_t to make it consistent
with caddr_t (which was causing me a problem) and to remove the use of
the obsolete form of AC_CHECK_TYPE.

Steve Ellcey
sell...@mips.com

Re: [Patch] Fix type of clock_t in timevar.c

2013-11-01 Thread Steve Ellcey

On Fri, 2013-11-01 at 12:43 -0700, Mike Stump wrote:
> On Nov 1, 2013, at 10:47 AM, Steve Ellcey  wrote:
> > While working on a canadian cross build I ran into a problem with the
> > type of clock_t.  If HAVE_CLOCK_T is not defined
> 
> > timevar.c defines it to be int.  I think the type should be long.  I am 
> > using the mingw
> > compilers on linux in my canadian cross build and HAVE_CLOCK_T is not 
> > getting set but when building I get a conflict between the definition
> > of clock_t in /usr/i586-mingw32msvc/include/time.h (long) and the
> > definition in timevar.c.
> 
> You should report a bug to them and have them define clock_t.

They are defining clock_t, but for some reason the GCC configure is not
seeing it (perhaps because of what header files get included).

> > Since mingw and linux and glibc and newlib all seem to agree on long
> > I would like to change timevar.c to agree with them.
> 
> Seems reasonable to me, though newlib and macosx use unsigned long.  glibc
> remains at long, due to binary compatibility.

Ah, I confused _CLOCK_T_ (unsigned long) with  __CLOCK_T_TYPE (long) in
newlib.  clock_t is defined as _CLOCK_T_ in newlib, not __CLOCK_T_TYPE.

Steve Ellcey
sell...@mips.com

Re: [Patch] Fix type of clock_t in timevar.c

2013-11-01 Thread Steve Ellcey

On Fri, 2013-11-01 at 13:45 -0700, Mike Stump wrote:
> On Nov 1, 2013, at 12:56 PM, Steve Ellcey  wrote:
> >> You should report a bug to them and have them define clock_t.
> > 
> > They are defining clock_t, but for some reason the GCC configure is not
> > seeing it (perhaps because of what header files get included).
> 
> Curious, do you have sys/time.h?  If so, that's likely the cause, as system.h 
> does this:
> 
> # if HAVE_SYS_TIME_H
> #  include 
> # else
> #  ifdef HAVE_TIME_H
> #   include 
> #  endif
> # endif
> 
> thus cleverly avoiding even including time.h when it exists.  I suspect
> TIME_WITH_SYS_TIME failed for some silly reason, though, can't guess why.
> autoconf is fun when it fails.

Yes, mingw does have sys/time.h, but their sys/time.h includes 
so it should work.  And now when I try to reproduce the problem it seems
to work.  I am getting HAVE_CLOCK_T defined in auto-build.h and
auto-host.h.

Steve Ellcey
sell...@mips.com

[Patch, MIPS] Patch for PR 68400, a mips16 bug

2016-01-26 Thread Steve Ellcey

Here is a patch for PR6400.  The problem is that and_operands_ok was checking
one operand to see if it was a memory_operand but MIPS16 addressing is more
restrictive than what the general memory_operand allows.  The fix was to
call mips_classify_address if TARGET_MIPS16 is set because it will do a
more complete mips16 addressing check and reject operands that do not meet
the more restrictive requirements.

I ran the GCC testsuite with no regressions and have included a test case as
part of this patch.

OK to checkin?

Steve Ellcey
sell...@imgtec.com


2016-01-26  Steve Ellcey  

PR target/68400
* config/mips/mips.c (and_operands_ok): Add MIPS16 check.



diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index dd54d6a..adeafa3 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -8006,9 +8006,18 @@ mask_low_and_shift_p (machine_mode mode, rtx mask, rtx 
shift, int maxlen)
 bool
 and_operands_ok (machine_mode mode, rtx op1, rtx op2)
 {
-  return (memory_operand (op1, mode)
- ? and_load_operand (op2, mode)
- : and_reg_operand (op2, mode));
+
+  if (memory_operand (op1, mode))
+{
+  if (TARGET_MIPS16) {
+   struct mips_address_info addr;
+   if (!mips_classify_address (&addr, op1, mode, false))
+ return false;
+  }
+  return and_load_operand (op2, mode);
+}
+  else
+return and_reg_operand (op2, mode);
 }
 
 /* The canonical form of a mask-low-and-shift-left operation is




2016-01-26  Steve Ellcey  

PR target/68400
* gcc.target/mips/mips.exp (mips_option_groups): Add stack-protector.
* gcc.target/mips/pr68400.c: New test.


diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
b/gcc/testsuite/gcc.target/mips/mips.exp
index f191331..ff9c99a 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -257,6 +257,7 @@ set mips_option_groups {
 lsa "(|!)HAS_LSA"
 section_start "-Wl,--section-start=.*"
 frame-header "-mframe-header-opt|-mno-frame-header-opt"
+stack-protector "-fstack-protector"
 }
 
 for { set option 0 } { $option < 32 } { incr option } {
diff --git a/gcc/testsuite/gcc.target/mips/pr68400.c 
b/gcc/testsuite/gcc.target/mips/pr68400.c
index e69de29..1099568 100644
--- a/gcc/testsuite/gcc.target/mips/pr68400.c
+++ b/gcc/testsuite/gcc.target/mips/pr68400.c
@@ -0,0 +1,28 @@
+/* PR target/pr68400
+   This was triggering an ICE in change_address_1 when compiled with -Os.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fstack-protector -mips16" } */
+
+typedef struct s {
+ unsigned long long d;
+ long long t;
+} p;
+
+int sh(int x, unsigned char *buf)
+{
+ p *uhdr = (p *)buf;
+ unsigned int i = 0;
+ uhdr->d = ((uhdr->d & 0xff00LL) >> 56)
+| ((uhdr->d & 0xff00LL) >> 24)
+| ((uhdr->d & 0xff00LL) << 8)
+| ((uhdr->d & 0x00ffLL) << 56);
+ uhdr->t = ((uhdr->t & 0xff00LL) >> 56)
+| ((uhdr->t & 0xff00LL) >> 24)
+| ((uhdr->t & 0x00ffLL) >> 8)
+| ((uhdr->t & 0xff00LL) << 8)
+| ((uhdr->t & 0xff00LL) << 40)
+| ((uhdr->t & 0x00ffLL) << 56);
+ i += 4;
+ if (x < i) return 0; else return 1;
+}

Re: [PATCH] libcpp: use better locations for _Pragma tokens (preprocessor/69126)

2016-01-28 Thread Steve Ellcey

David,

This patch has broken the top-of-tree glibc build.  I get an error on an
undef that is supposed to be protected by Pragmas.  It happens when
compiling intl/plural.c in glibc.  I created a preprocessed source file
but when I compile it, the error doesn't happen.  I will try to create a
small unpreprocessed standalone test case but maybe this is enough for
you to know what is happening.

If I look at the preprocessed code, the statement in question is
under ingore warning pragmas.

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wuninitialized"
#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
  *++yyvsp = __gettextlval;
#pragma GCC diagnostic pop


Steve Ellcey
sell...@imgtec.com

plural.c: In function '__gettextparse':
plural.c:1767:12: error: '__gettextlval.num' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
   *++yyvsp = yylval;
 
plural.c:66:25: note: '__gettextlval.num' was declared here
 #define yylval  __gettextlval
 ^
plural.c:1265:9: note: in expansion of macro 'yylval'
 YYSTYPE yylval YY_INITIAL_VALUE(yyval_default);
 ^~
cc1: all warnings being treated as errors

Re: [PATCH] libcpp: use better locations for _Pragma tokens (preprocessor/69126)

2016-01-28 Thread Steve Ellcey

On Thu, 2016-01-28 at 14:59 -0500, David Malcolm wrote:
> On Thu, 2016-01-28 at 14:48 -0500, David Malcolm wrote:
> > On Thu, 2016-01-28 at 11:12 -0800, Steve Ellcey wrote:
> > > David,
> > > 
> > > This patch has broken the top-of-tree glibc build
> > 
> > Bother; sorry.  FWIW, I'm about to get on a plane (for FOSDEM), so
> > I'm
> > not in a great position to tackle this yet.
> 
> [...]
> 
> I've filed PR preprocessor/69543 to cover this regression:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69543
> 
> Should the patches be reverted in the meantime?(r232893 and r232928)

I can live with the problem for a few days, so guess it depends on when
you think you will have time to look at it more.

Steve Ellcey

[Patch, MIPS] Fix PR target/68273, passing args in wrong regs

2016-01-29 Thread Steve Ellcey


This is a patch for PR 68273, where MIPS is passing arguments in the wrong
registers.  The problem is that when passing an argument by value,
mips_function_arg_boundary was looking at the type of the argument (and
not just the mode), and if that argument was a variable with extra alignment
info (say an integer variable with 8 byte alignment), MIPS was aligning the
argument on an 8 byte boundary instead of a 4 byte boundary the way it should.

Since we are passing values (not variables), the alignment of the variable
that the value is copied from should not affect the alignment of the value
being passed.

This patch fixes the problem and it could change what registers arguments
are passed in, which means it could affect backwards compatibility with
older programs.  But since the current behaviour is not compliant with the
MIPS ABI and does not match what LLVM does, I think we have to make this
change.  For the most part this will only affect arguments which are copied
from variables that have non-standard alignments set by the aligned attribute,
however the SRA optimization pass can create aligned variables as it splits
aggregates apart and that was what triggered this bug report.

This is basically the same bug as the ARM bug PR 65956 and the fix is
pretty much the same too.

Rather than create MIPS specific tests that check the use of specific
registers I created two tests to put in gcc.c-torture/execute that
were failing before because GCC on MIPS was not consistent in where
arguments were passed and which now work with this patch.

Tested with mips-mti-linux-gnu and no regressions.  OK to checkin?

Steve Ellcey
sell...@imgtec.com


2016-01-29  Steve Ellcey  

PR target/68273
* config/mips/mips.c (mips_function_arg_boundary): Fix argument
alignment.



diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index dd54d6a..ecce3cd 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -5643,8 +5643,9 @@ static unsigned int
 mips_function_arg_boundary (machine_mode mode, const_tree type)
 {
   unsigned int alignment;
-
-  alignment = type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode);
+  alignment = type && mode == BLKmode
+ ? TYPE_ALIGN (TYPE_MAIN_VARIANT (type))
+ : GET_MODE_ALIGNMENT (mode);
   if (alignment < PARM_BOUNDARY)
 alignment = PARM_BOUNDARY;
   if (alignment > STACK_BOUNDARY)


2016-01-29  Steve Ellcey  

PR target/68273
* gcc.c-torture/execute/pr68273-1.c: New test.
* gcc.c-torture/execute/pr68273-2.c: New test.


diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c
index e69de29..3ce07c6 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c
@@ -0,0 +1,74 @@
+/* Make sure that the alignment attribute on an argument passed by
+   value does not affect the calling convention and what registers
+   arguments are passed in.  */
+
+extern void exit (int);
+extern void abort (void);
+
+typedef int alignedint __attribute__((aligned(8)));
+
+int  __attribute__((noinline))
+foo1 (int a, alignedint b)
+{ return a + b; }
+
+int __attribute__((noinline))
+foo2 (int a, int b)
+{
+  return a + b;
+}
+
+int __attribute__((noinline))
+bar1 (alignedint x)
+{
+  return foo1 (1, x);
+}
+
+int __attribute__((noinline))
+bar2 (alignedint x)
+{
+  return foo1 (1, (alignedint) 99);
+}
+
+int __attribute__((noinline))
+bar3 (alignedint x)
+{
+  return foo1 (1, x + (alignedint) 1);
+}
+
+alignedint q = 77;
+
+int __attribute__((noinline))
+bar4 (alignedint x)
+{
+  return foo1 (1, q);
+}
+
+
+int __attribute__((noinline))
+bar5 (alignedint x)
+{
+  return foo2 (1, x);
+}
+
+int __attribute__((noinline))
+use_arg_regs (int i, int j, int k)
+{
+  return i+j-k;
+}
+
+int main()
+{
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (foo1 (19,13) != 32) abort ();
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (bar1 (-33) != -32) abort ();
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (bar2 (1) != 100) abort ();
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (bar3 (17) != 19) abort ();
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (bar4 (-33) != 78) abort ();
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (bar5 (-84) != -83) abort ();
+   exit (0);
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c 
b/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c
index e69de29..1661be9 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c
@@ -0,0 +1,109 @@
+/* Make sure that the alignment attribute on an argument passed by
+   value does not affect the calling convention and what registers
+   arguments are passed in.  */
+
+extern void exit (int);
+extern void abort (void);
+
+typedef struct s {
+   char c;
+   char d;
+} t1;
+typedef

Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs

2016-02-01 Thread Steve Ellcey

On Sat, 2016-01-30 at 11:06 +, Richard Sandiford wrote:
> I'm not sure this patch is safe.  The received wisdom used to be that
> ABIs should be defined in terms of types, not modes, since types
> represent the source code while modes are an internal GCC concept
> that could change over time (although in practice the barrier for
> that is pretty high).
> 
> The patch that introduced the line being patched was part of a series
> that fixed ABI incompatibilities with MIPSpro, and IIRC some of the
> incompatibilties really were due to us looking at modes when we should
> have been looking at types.

My main reason for looking at mode instead of type for non-aggregates
was because of argument promotion.  When I looked at a function with a
char argument, the type was char but the mode was SI.  Now I could still
use the alignment of char (1 byte) since it would get extended to 4
bytes in mips_function_arg_boundary by:

  if (alignment < PARM_BOUNDARY)
alignment = PARM_BOUNDARY;

But it still seemed a bit 'wrong' to me.  Maybe something in the
front/middle ends should be updating the type to match any argument
promotion that is being done?

Steve Ellcey
sell...@imgtec.com

Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs

2016-02-02 Thread Steve Ellcey

On Sat, 2016-01-30 at 11:06 +, Richard Sandiford wrote:

> We need to be careful of examples like:
> 
>   struct __attribute__ ((aligned (8))) s { _Complex float x; };
>   void foo (struct s *ptr, struct s val) { *ptr = val; }
> 
> "x" gets SCmode, which has an alignment of 4.  And it's OK for TYPE_MODE
> to have a smaller alignment than the type -- it's just not allowed to
> have a larger alignment (and even that restriction only applies because
> this is a STRICT_ALIGNMENT target).  So the structure itself inherits
> this SCmode.
> 
> The patch therefore changes how we handle foo() for -mabi=32 -msoft-float.
> Before the patch "val" is passed in $6 and $7.  After the patch it's
> passed in $5 and $6.  clang behaves like the unpatched GCC.
> 
> If instead we use:
> 
>   struct __attribute__ ((aligned (8))) s { float x; float y; };
>   void foo (struct s *ptr, struct s val) { *ptr = val; }
> 
> then the structure has BLKmode and the alignment is honoured both before
> and after the patch.
> 
> There's no real ABI reason for handling the two cases differently.
> The fact that one gets BLKmode and the other doesn't is down
> to GCC internals.
> 
> We also have to be careful about the STRICT_ALIGNMENT thing.
> At the moment that's hard-coded to 1 for MIPS, but it's possible that
> it could become configurable in future, like it is for aarch64 and
> rs6000.  !STRICT_ALIGNMENT allows TYPE_MODE to have a larger
> alignment than the type, so underaligned structures would get modes
> when they didn't previously.  That would in turn change how they're
> passed as arguments.
> 
> Thanks,
> Richard

Richard,

Can you explain why the GCC internals cause us to get SCmode instead of
BLKmode for the example with _Complex?  I don't understand that.  It
seems wrong to me and I don't understand where it is coming from.

Steve Ellcey
sell...@imgtec.com

Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs

2016-02-03 Thread Steve Ellcey


Here is a new patch for PR target/68273.  It makes the GCC calling conventions
compatible with LLVM so that the two agree with each other in all the cases
I could think of testing and it fixes the reported defect.

I couldn't get the GCC compatibility test to work (see
https://gcc.gnu.org/ml/gcc/2016-02/msg00017.html) so I wasn't able to use
that for compatibility testing, instead I hand examined routines with various
argument types (structures, ints, complex, etc) to see what registers 
GCC and LLVM were accessing.

This change does introduce an ABI/compatibility change with GCC itself and
that is obviously a concern.  Basically, any type that is passed by value
and has an external alignment applied to it may get passed in different
registers because we strip off that alignment (via TYPE_MAIN_VARIANT) before
determining the alignment of the variable.

If we don't want to break the GCC compatibility then we will continue to have an
incompatibility with LLVM and we will need to find another way to deal
with the aligned int variable that SRA is creating and passing to a function
that expects a 'normal' integer.

One thought I had was that we could compare TYPE_ALIGN (type) and
TYPE_ALIGN (TYPE_MAIN_VARIANT (type) and if they are different, issue
a warning during compilation about a possible incompatibility with
older objects.

Steve Ellcey
sell...@imgtec.com


2016-02-03  Steve Ellcey  

PR target/68273
* config/mips/mips.c (mips_function_arg_boundary): Fix argument
alignment.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 697abc2..4aa215f 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -5644,7 +5644,10 @@ mips_function_arg_boundary (machine_mode mode, 
const_tree type)
 {
   unsigned int alignment;
 
-  alignment = type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode);
+  alignment = type
+   ? TYPE_ALIGN (TYPE_MAIN_VARIANT (type))
+   : GET_MODE_ALIGNMENT (mode);
+
   if (alignment < PARM_BOUNDARY)
 alignment = PARM_BOUNDARY;
   if (alignment > STACK_BOUNDARY)




2016-02-03  Steve Ellcey  

PR target/68273
* gcc.c-torture/execute/pr68273-1.c: New test.
* gcc.c-torture/execute/pr68273-2.c: New test.


diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c
index e69de29..3ce07c6 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c
@@ -0,0 +1,74 @@
+/* Make sure that the alignment attribute on an argument passed by
+   value does not affect the calling convention and what registers
+   arguments are passed in.  */
+
+extern void exit (int);
+extern void abort (void);
+
+typedef int alignedint __attribute__((aligned(8)));
+
+int  __attribute__((noinline))
+foo1 (int a, alignedint b)
+{ return a + b; }
+
+int __attribute__((noinline))
+foo2 (int a, int b)
+{
+  return a + b;
+}
+
+int __attribute__((noinline))
+bar1 (alignedint x)
+{
+  return foo1 (1, x);
+}
+
+int __attribute__((noinline))
+bar2 (alignedint x)
+{
+  return foo1 (1, (alignedint) 99);
+}
+
+int __attribute__((noinline))
+bar3 (alignedint x)
+{
+  return foo1 (1, x + (alignedint) 1);
+}
+
+alignedint q = 77;
+
+int __attribute__((noinline))
+bar4 (alignedint x)
+{
+  return foo1 (1, q);
+}
+
+
+int __attribute__((noinline))
+bar5 (alignedint x)
+{
+  return foo2 (1, x);
+}
+
+int __attribute__((noinline))
+use_arg_regs (int i, int j, int k)
+{
+  return i+j-k;
+}
+
+int main()
+{
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (foo1 (19,13) != 32) abort ();
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (bar1 (-33) != -32) abort ();
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (bar2 (1) != 100) abort ();
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (bar3 (17) != 19) abort ();
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (bar4 (-33) != 78) abort ();
+   if (use_arg_regs (999, 999, 999) != 999) abort ();
+   if (bar5 (-84) != -83) abort ();
+   exit (0);
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c 
b/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c
index e69de29..1661be9 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c
@@ -0,0 +1,109 @@
+/* Make sure that the alignment attribute on an argument passed by
+   value does not affect the calling convention and what registers
+   arguments are passed in.  */
+
+extern void exit (int);
+extern void abort (void);
+
+typedef struct s {
+   char c;
+   char d;
+} t1;
+typedef t1 t1_aligned8  __attribute__((aligned(8)));
+typedef t1 t1_aligned16 __attribute__((aligned(16)));
+typedef t1 t1_aligned32 __attribute__((aligned(32)));
+
+int bar1(int a, t1 b)
+{
+   return a + b.c + b.d;
+}
+
+int bar2(int a, t1_aligned8 b)
+{
+   return a + b.c + b.d;
+}
+

Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs

2016-02-04 Thread Steve Ellcey

On Thu, 2016-02-04 at 12:09 +0100, Eric Botcazou wrote:
> > So this doesn’t fix aarch64, c6x, epiphany, ia64, iq2000, rs6000, rx, sparc,
> > tilegx, tilepro or xtensa.
> > :-(  That’s one of the problems by having each port copy and paste swaths of
> > :code from other ports to express the same thing instead of ports sharing
> > :just one copy of code.  My port is also broken in the same way
> > :(currently).
> 
> Yes, fixing a compiler bug by changing the ABI is a no-no, and the argument 
> of 
> the compatibility with LLVM has IMO little merit since it's a GCC extension.

But it is a GCC extension that is implemented in LLVM.  It's just
implemented differently there.  We either have to change GCC, change
LLVM, or live with the difference.

In any of these cases I wonder if we should change GCC so that the code
below generates warnings.  If the two types are going to get passed
differently we shouldn't allow calls with one type to call a function
expecting the other type to happen with no warning or error.  Currently
this code does not warn even with -Wall.

typedef int alignedint __attribute__((aligned(8)));
extern void foo (alignedint a);
extern void bar (int);

int a;
alignedint b;

int main()
{
foo(a);
bar(b);
}

Steve Ellcey
sell...@imgtec.com

[Patch, MIPS] Patch for PR 68273 (user aligned variable arguments)

2016-02-24 Thread Steve Ellcey

Here is a new patch for PR 68273.  The original problem with gsoap has
been fixed by changing GCC to not create overly-aligned variables in
the SRA pass but the MIPS specific problem of how user-aligned variables
are passed to functions remains.

Because MIPS GCC is internally inconsistent, it seems like changing
the passing convention for user aligned variables on MIPS is the best
option, even though this is an ABI change for MIPS.

For example:

typedef int alignedint __attribute__((aligned(8)));
int foo1 (int x, alignedint y) { return x+y; }
int foo2 (int x, int y) { return x+y; }
int foo3 (int x, alignedint y) { return x+y; }
int foo4 (int x, int y) { return x+y; }
int splat1(int a, alignedint b) { foo1(a,b); }
int splat2(int a, alignedint b) { foo2(a,b); }
int splat3(int a, int b) { foo3(a,b); }
int splat4(int a, int b) { foo4(a,b); }

In this case foo1 and foo3 would expect the second argument to be in
register $6, but foo2 and foo4 would exect it in register $5.

Likewise splat1 and splat2 would pass the second argument in $6, but
splat3 and splat4 would pass it in $5.

This means that the call from splat2 to foo2 and the call from splat3
to foo3 would be wrong in that the caller is putting the argument in
one register but the callee is getting out of a different register.

In none of these cases would GCC give a warning about the inconsistent
parameter passing 

Since this patch could cause a change in the users program, I have
added a warning that will be emitted whenever a user passes an
aligned type or when a parameter is declared as an aligned type
and that alignment could cause a change in how the value is passed.

I also added a way to turn that warning off in case the user doesn't
want to see it (-mno-warn-aligned-args).  I did not add an option
to make GCC pass arguments in the old manner as I consider that method
of passing arguments to be a bug and I don't think we want to have an
option to propogate that incorrect behavior.

Steve Ellcey
sell...@imgtec.com


2016-02-24  Steve Ellcey  

PR target/68273
* config/mips/mips.opt (mwarn-aligned-args): New flag.
* config/mips/mips.h (mips_args): Add new field.
* config/mips/mips.c (mips_internal_function_arg): New function.
(mips_function_incoming_arg): New function.
(mips_old_function_arg_boundary): New function.
(mips_function_arg): Rewrite to use mips_internal_function_arg.
(mips_function_arg_boundary): Fix argument alignment.
(TARGET_FUNCTION_INCOMING_ARG): New define.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 697abc2..05465c1 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -1124,6 +1124,7 @@ static const struct mips_rtx_cost_data
 static rtx mips_find_pic_call_symbol (rtx_insn *, rtx, bool);
 static int mips_register_move_cost (machine_mode, reg_class_t,
reg_class_t);
+static unsigned int mips_old_function_arg_boundary (machine_mode, const_tree);
 static unsigned int mips_function_arg_boundary (machine_mode, const_tree);
 static machine_mode mips_get_reg_raw_mode (int regno);
 
@@ -5459,11 +5460,11 @@ mips_strict_argument_naming (cumulative_args_t ca 
ATTRIBUTE_UNUSED)
   return !TARGET_OLDABI;
 }
 
-/* Implement TARGET_FUNCTION_ARG.  */
+/* Used to implement TARGET_FUNCTION_ARG and TARGET_FUNCTION_INCOMING_ARG.  */
 
 static rtx
-mips_function_arg (cumulative_args_t cum_v, machine_mode mode,
-  const_tree type, bool named)
+mips_internal_function_arg (cumulative_args_t cum_v, machine_mode mode,
+   const_tree type, bool named)
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
   struct mips_arg_info info;
@@ -5586,6 +5587,50 @@ mips_function_arg (cumulative_args_t cum_v, machine_mode 
mode,
   return gen_rtx_REG (mode, mips_arg_regno (&info, TARGET_HARD_FLOAT));
 }
 
+/* Implement TARGET_FUNCTION_ARG.  */
+
+static rtx
+mips_function_arg (cumulative_args_t cum_v, machine_mode mode,
+  const_tree type, bool named)
+{
+  bool doubleword_aligned_p = (mips_function_arg_boundary (mode, type)
+ > BITS_PER_WORD);
+  bool old_doubleword_aligned_p = (mips_old_function_arg_boundary (mode, type)
+ > BITS_PER_WORD);
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+
+  if (doubleword_aligned_p != old_doubleword_aligned_p
+  && mips_warn_aligned_args && !cum->aligned_arg_warning_given)
+{
+  warning (0, "argument %d in the call may be passed in a manner 
incompatible with previous GCC versions", cum->arg_number+1);
+  cum->aligned_arg_warning_given = 1;
+}
+
+  return mips_internal_function_arg (cum_v, mode, type, named);
+}
+
+/* Implement TARGET_FUNCTION_INCOMING_ARG.  */
+
+static rtx
+mips_function_i

RE: [Patch, MIPS] Patch for PR 68273 (user aligned variable arguments)

2016-02-24 Thread Steve Ellcey

On Wed, 2016-02-24 at 13:46 -0800, Matthew Fortune wrote:

> Thanks for enumerating all the cases. I'd not looked at all of them. I
> do agree that we need a fix given the existing inconsistencies.
> 
> One question I have is where does an over aligned argument get pushed
> to the stack with the patch in place. I.e. when taking its address, is
> the alignment achieved up to the limit of stack alignment or do they now
> only get register-size alignment? If the former then the idea that
> argument passing is defined as a structure in memory with the first
> portion in registers will no longer hold true. Not sure if that is a
> problem.

Passing arguments on the stack is not affected by this change, except
that we may start using memory at a different argument because of
changes in the number of registers we use to pass arguments.  The extra
alignment was already being ignored for arguments passed on the stack
and was only affecting register arguments.  So we are now more
consistent in terms of argument passing looking like a structure.  Those
structure 'fields' now do not include the user defined alignment for
register args or memory args.  An example:

typedef int alignedint __attribute__((aligned(8)));
int foo1 (int a, int b, int c, int d, int e, int x, alignedint y, int z)
{ return a+b+c+d+e+x+y+z; }

e, x, y, and z were being passed at offsets 16, 20, 24, and 28 before
this change and after this change.

> > diff --git a/gcc/testsuite/gcc.dg/pr48335-2.c
> > b/gcc/testsuite/gcc.dg/pr48335-2.c
> > index a37c079..f56b6fd 100644
> > --- a/gcc/testsuite/gcc.dg/pr48335-2.c
> > +++ b/gcc/testsuite/gcc.dg/pr48335-2.c
> > @@ -1,6 +1,7 @@
> >  /* PR middle-end/48335 */
> >  /* { dg-do compile } */
> >  /* { dg-options "-O2 -fno-tree-sra" } */
> > +/* { dg-options "-O2 -fno-tree-sra -mno-warn-aligned-args" { target
> > mips*-*-* } } */
> 
> Presumably this means that both under alignment of 64-bit types and over
> alignment of < 64-bit types are affected? The explicit test cases do not
> cover an under aligned long long case which would be good to cover.

Yes, the under alignment of long long's is affected.  I have added a new
testcase (pr68273-3.c) to check that.

Here is a new version of the patch with the invoke.texi text, the new
testcase, and the style changes you pointed out.

Steve Ellcey
sell...@imgtec.com


2016-02-24  Steve Ellcey  

PR target/68273
* config/mips/mips.opt (mwarn-aligned-args): New flag.
* config/mips/mips.h (mips_args): Add new field.
* config/mips/mips.c (mips_internal_function_arg): New function.
(mips_function_incoming_arg): New function.
(mips_old_function_arg_boundary): New function.
(mips_function_arg): Rewrite to use mips_internal_function_arg.
(mips_function_arg_boundary): Fix argument alignment.
(TARGET_FUNCTION_INCOMING_ARG): New define.
* doc/invoke.texi (MIPS Options): Document new flag.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 5af3d1e..ea4733b 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -1124,6 +1124,7 @@ static const struct mips_rtx_cost_data
 static rtx mips_find_pic_call_symbol (rtx_insn *, rtx, bool);
 static int mips_register_move_cost (machine_mode, reg_class_t,
reg_class_t);
+static unsigned int mips_old_function_arg_boundary (machine_mode, const_tree);
 static unsigned int mips_function_arg_boundary (machine_mode, const_tree);
 static machine_mode mips_get_reg_raw_mode (int regno);
 
@@ -5459,11 +5460,11 @@ mips_strict_argument_naming (cumulative_args_t ca 
ATTRIBUTE_UNUSED)
   return !TARGET_OLDABI;
 }
 
-/* Implement TARGET_FUNCTION_ARG.  */
+/* Used to implement TARGET_FUNCTION_ARG and TARGET_FUNCTION_INCOMING_ARG.  */
 
 static rtx
-mips_function_arg (cumulative_args_t cum_v, machine_mode mode,
-  const_tree type, bool named)
+mips_internal_function_arg (cumulative_args_t cum_v, machine_mode mode,
+   const_tree type, bool named)
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
   struct mips_arg_info info;
@@ -5586,6 +5587,54 @@ mips_function_arg (cumulative_args_t cum_v, machine_mode 
mode,
   return gen_rtx_REG (mode, mips_arg_regno (&info, TARGET_HARD_FLOAT));
 }
 
+/* Implement TARGET_FUNCTION_ARG.  */
+
+static rtx
+mips_function_arg (cumulative_args_t cum_v, machine_mode mode,
+  const_tree type, bool named)
+{
+  bool doubleword_aligned_p = (mips_function_arg_boundary (mode, type)
+ > BITS_PER_WORD);
+  bool old_doubleword_aligned_p = (mips_old_function_arg_boundary (mode, type)
+ > BITS_PER_WORD);
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+
+  if (doubleword_aligned_p != old_doubl

Re: [PATCH, ia64] [PR target/52731] internal compiler error: in ia64_st_address_bypass_p, at config/ia64/ia64.c:9357

2013-12-03 Thread Steve Ellcey

On Fri, 2013-11-29 at 15:01 +0300, Kirill Yukhin wrote:
> Hello,
> On 20 Nov 18:37, Kirill Yukhin wrote:
> > Hello,
> > Patch in the bottom fixes PR52731.
> > Is it ok for trunk?
> Ping?
> 
> --
> Thanks, K

OK.

Steve Ellcey
sell...@mips.com

[Patch] Remove references to non-existent tree-flow.h file

2014-01-09 Thread Steve Ellcey

While looking at PR 59335 (plugin doesn't build) I saw the comments about
tree-flow.h and tree-flow-inline.h not existing anymore.  While these
files have been removed there are still some references to them in
Makefile.in, doc/tree-ssa.texi, and a couple of source files.  This patch
removes the references to these now-nonexistent files.

OK to checkin?

Steve Ellcey
sell...@mips.com


2014-01-09  Steve Ellcey  

* Makefile.in (TREE_FLOW_H): Remove.
(TREE_SSA_H): Add files names from tree-flow.h.
* doc/tree-ssa.texi (Annotations): Remove reference to tree-flow.h
* tree.h: Remove tree-flow.h reference.
* hash-table.h: Remove tree-flow.h reference.
* tree-ssa-loop-niter.c (dump_affine_iv): Replace tree-flow.h
reference with tree-ssa-loop.h.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 459b1ba..8eb4f68 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -929,11 +929,10 @@ CPP_ID_DATA_H = $(CPPLIB_H) 
$(srcdir)/../libcpp/include/cpp-id-data.h
 CPP_INTERNAL_H = $(srcdir)/../libcpp/internal.h $(CPP_ID_DATA_H)
 TREE_DUMP_H = tree-dump.h $(SPLAY_TREE_H) $(DUMPFILE_H)
 TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H)
-TREE_FLOW_H = tree-flow.h tree-flow-inline.h tree-ssa-operands.h \
+TREE_SSA_H = tree-ssa.h tree-ssa-operands.h \
$(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \
$(HASHTAB_H) $(CGRAPH_H) $(IPA_REFERENCE_H) \
tree-ssa-alias.h
-TREE_SSA_H = tree-ssa.h $(TREE_FLOW_H)
 PRETTY_PRINT_H = pretty-print.h $(INPUT_H) $(OBSTACK_H)
 TREE_PRETTY_PRINT_H = tree-pretty-print.h $(PRETTY_PRINT_H)
 GIMPLE_PRETTY_PRINT_H = gimple-pretty-print.h $(TREE_PRETTY_PRINT_H)
diff --git a/gcc/doc/tree-ssa.texi b/gcc/doc/tree-ssa.texi
index 391dba8..e0238bd 100644
--- a/gcc/doc/tree-ssa.texi
+++ b/gcc/doc/tree-ssa.texi
@@ -53,9 +53,6 @@ variable has aliases.  All these attributes are stored in data
 structures called annotations which are then linked to the field
 @code{ann} in @code{struct tree_common}.
 
-Presently, we define annotations for variables (@code{var_ann_t}).
-Annotations are defined and documented in @file{tree-flow.h}.
-
 
 @node SSA Operands
 @section SSA Operands
diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index 2b04067..034385c 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -1050,10 +1050,7 @@ hash_table ::end ()
 
 /* Iterate through the elements of hash_table HTAB,
using hash_table <>::iterator ITER,
-   storing each element in RESULT, which is of type TYPE.
-
-   This macro has this form for compatibility with the
-   FOR_EACH_HTAB_ELEMENT currently defined in tree-flow.h.  */
+   storing each element in RESULT, which is of type TYPE.  */
 
 #define FOR_EACH_HASH_TABLE_ELEMENT(HTAB, RESULT, TYPE, ITER) \
   for ((ITER) = (HTAB).begin (); \
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 5a10297..7628363 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1311,7 +1311,7 @@ dump_affine_iv (FILE *file, affine_iv *iv)
if EVERY_ITERATION is true, we know the test is executed on every iteration.
 
The results (number of iterations and assumptions as described in
-   comments at struct tree_niter_desc in tree-flow.h) are stored to NITER.
+   comments at struct tree_niter_desc in tree-ssa-loop.h) are stored to NITER.
Returns false if it fails to determine number of iterations, true if it
was determined (possibly with some assumptions).  */
 
diff --git a/gcc/tree.h b/gcc/tree.h
index fa79b6f..67454b7 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1114,9 +1114,6 @@ extern void protected_set_expr_location (tree, 
location_t);
the given label expression.  */
 #define LABEL_EXPR_LABEL(NODE)  TREE_OPERAND (LABEL_EXPR_CHECK (NODE), 0)
 
-/* VDEF_EXPR accessors are specified in tree-flow.h, along with the other
-   accessors for SSA operands.  */
-
 /* CATCH_EXPR accessors.  */
 #define CATCH_TYPES(NODE)  TREE_OPERAND (CATCH_EXPR_CHECK (NODE), 0)
 #define CATCH_BODY(NODE)   TREE_OPERAND (CATCH_EXPR_CHECK (NODE), 1)

[Patch, testsuite, mips] Fix test gcc.dg/delay-slot-1.c for MIPS

2014-01-09 Thread Steve Ellcey


The gcc.dg/delay-slot-1.c test is failing for MIPS targets that
do not support the 64 bit ABI because it didn't check to see
if that support existed before using the -mabi=64 flag.

This patch fixes the problem by using the mips64 check.

OK to checkin?

Steve Ellcey
sell...@mips.com


2014-01-09  Steve Ellcey  

* gcc.dg/delay-slot-1.c: Add check for 64 bit support.



diff --git a/gcc/testsuite/gcc.dg/delay-slot-1.c 
b/gcc/testsuite/gcc.dg/delay-slot-1.c
index f3bcd8e..bfc0273 100644
--- a/gcc/testsuite/gcc.dg/delay-slot-1.c
+++ b/gcc/testsuite/gcc.dg/delay-slot-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
-/* { dg-options "-O2 -mabi=64" { target mips-*-linux-* } } */
+/* { dg-options "-O2 -mabi=64" { target { mips*-*-linux* && mips64 } } } */
 
 struct offset_v1 {
 int k_uniqueness;

[Patch] Fix PR plugin/59335, plugins not compiling

2014-01-09 Thread Steve Ellcey

Some header files needed by plugins are not present in the installed plugin
include directory.  This patch adds those headers.  I added all the headers
I needed for my plugin plus the ones mentioned in the patch.  I did not add
the x86 specific *.def files but this should fix all the generic platform
issues that are mentioned in the defect.

Tested with my switch-shortcut plugin on MIPS.

OK to checkin?

Steve Ellcey
sell...@mips.com


2014-01-09  Steve Ellcey  

PR plugins/59335
* Makefile.in (PLUGIN_HEADERS): Add gimplify.h, gimple-iterator.h,
gimple-ssa.h, fold-const.h, tree-cfg.h, tree-into-ssa.h,
tree-ssanames.h, print-tree.h, varasm.h, and context.h.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8eb4f68..76766ba 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3118,7 +3118,9 @@ PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) 
coretypes.h $(TM_H) \
   cppdefault.h flags.h $(MD5_H) params.def params.h prefix.h tree-inline.h \
   $(GIMPLE_PRETTY_PRINT_H) realmpfr.h \
   $(IPA_PROP_H) $(TARGET_H) $(RTL_H) $(TM_P_H) $(CFGLOOP_H) $(EMIT_RTL_H) \
-  version.h stringpool.h
+  version.h stringpool.h gimplify.h gimple-iterator.h gimple-ssa.h \
+  fold-const.h tree-cfg.h tree-into-ssa.h tree-ssanames.h print-tree.h \
+  varasm.h context.h
 
 # generate the 'build fragment' b-header-vars
 s-header-vars: Makefile

[Patch, MIPS] Fix SYSROOT_SUFFIX_SPEC for mips-mti-linux-gnu

2015-07-09 Thread Steve Ellcey

This patch enables builds with mips[32|64]r3 and mips[32|64]r5 in the
mips-mti-linux-gnu toolchain.  t-mti-linux uses MULTILIB_MATCHES to
map these to r2 but SYSROOT_SUFFIX_SPEC was not being set properly
to find the sysroot (the r2 one) for these architectures.  This patch
fixes that problem by updating MIPS_SYSVERSION_SPEC which is used
by SYSROOT_SUFFIX_SPEC.  It will only affect the mips-mti-linux-gnu
toolchain.

Tested with mips-mti-linux-gnu, OK to checkin?

Steve Ellcey
sell...@imgtec.com



2015-07-09  Steve Ellcey  

* config/mips/mti-linux.h (MIPS_SYSVERSION_SPEC): Update
to handle mips[32|64]r3 and mips[32|64]r5.


diff --git a/gcc/config/mips/mti-linux.h b/gcc/config/mips/mti-linux.h
index 03d1baa..b497625 100644
--- a/gcc/config/mips/mti-linux.h
+++ b/gcc/config/mips/mti-linux.h
@@ -17,10 +17,14 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
-/* This target is a multilib target, specify the sysroot paths.  */
-#define MIPS_SYSVERSION_SPEC   \
-"%{mips32:r1}%{mips64:r1}%{mips32r2:r2}%{mips64r2:r2}" \
-"%{mips32r6:r6}%{mips64r6:r6}%{mips16:-mips16}"
+/* This target is a multilib target, specify the sysroot paths.
+   MIPS_SYSVERSION_SPEC defaults to 'r2' (mips32r2 or mips64r2) unless
+   'r1' or 'r6' are specifically given so that mips32r3, mips32r5,
+   mips64r3, and mips64r5 will all default to 'r2'.  See MULTILIB_MATCHES
+   definition in t-mti-linux.  */
+
+#define MIPS_SYSVERSION_SPEC \
+"%{mips32|mips64:r1;mips32r6|mips64r6:r6;:r2}%{mips16:-mips16}"
 
 #undef SYSROOT_SUFFIX_SPEC
 #define SYSROOT_SUFFIX_SPEC\

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-28 Thread Steve Ellcey

Marek,

I have run into a problem with this warning and building glibc.

sysdeps/ieee754/s_matherr.c has:

int
weak_function
__matherr(struct exception *x)
{
int n=0;
if(x->arg1!=x->arg1) return 0;
return n;
}


And arg1 is a floating point type.  I think that if the value of
x->xarg1 is a NaN then the if statement should return TRUE because a NaN
never compares equal to anything, even another NaN (check with your
local IEEE expert).  I believe this method of checking for a NaN is
fairly common and I am not sure if GCC should be emitting a warning for
it.

Steve Ellcey
sell...@imgtec.com

[Patch, MIPS] MIPS specific optimization for o32 ABI

2015-07-28 Thread Steve Ellcey

This patch implements a MIPS o32 ABI specific optimization called frame
header optimization.  In the o32 ABI, routines allocate 16 bytes on the
stack before calling another routine.  This space is used by the callee
as space to write the register arguments to if their address is taken.
The n32 and n64 ABI's use the more common approach of copying register
arguments to local variables if their address is needed.

This optimization allows the callee to use that 16 bytes for other
purposes if it does not need it to write its arguments out to memory and
if it only needs 16 bytes of stack space (or less) for saving callee-saved
registers.

This can allow us to avoid having to allocate extra stack space in a routine
and to remove the stack pointer increment/decrement instructions from the 
prolog and epilogue which results in a small performance improvement.

This patch has been in the Mentor GCC toolchain for MIPS for a while and
gotten some testing there and I tested it on the top-of-tree GCC sources
with no regressions.

OK to checkin?

Steve Ellcey
sell...@imgtec.com


2015-07-28  Steve Ellcey  
Zoran Jovanovic  
Catherine Moore  
Tom de Vries  

* config/mips/mips.opt (mframe-header-opt): New option.
* config/mips/mips.c (struct mips_frame_info): Add
skip_stack_frame_allocation_p field.
(struct machine_function): Add callees_use_frame_header_p,
uses_frame_header_p, and initial_total_size fields.
(mips_frame_header_usage): New hash.
(mips_find_if_frame_header_is_used): New Function.
(mips_callee_use_frame_header): New Function.
(mips_callees_use_frame_header_p): New Function.
(mips_cfun_use_frame_header_p): New Function.
(mips_get_updated_offset): New Function.
(mips_skip_stack_frame_alloc): New Function.
(mips_frame_header_update_insn): New Function.
(mips_rest_of_frame_header_opt): New function.
(mips_compute_frame_info): Add recalculate and frame arguments.
(mips_frame_pointer_required): Add new args to
mips_compute_frame_info call.
(mips_initial_elimination_offset): Ditto.
(mips_gp_expand_needed_p): New function factored out of
mips_expand_ghost_gp_insns.
(mips_expand_ghost_gp_insns): Use mips_gp_expand_needed_p.
(mips_reorg): Use mips_rest_of_frame_header_opt.



2015-07-28  Steve Ellcey  
Tom de Vries  

* gcc.target/mips/fho-1.c: New test.
* gcc.target/mips/fho-2.c: New test.
* gcc.target/mips/mips.exp: Add -mframe-header-opt to
mips_option_groups.

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index c3cd52d..7cdef89 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -77,6 +77,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "builtins.h"
 #include "rtl-iter.h"
+#include "dumpfile.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -380,6 +381,9 @@ struct GTY(())  mips_frame_info {
 
   /* The offset of hard_frame_pointer_rtx from the bottom of the frame.  */
   HOST_WIDE_INT hard_frame_pointer_offset;
+
+  /* Skip stack frame allocation if possible.  */
+  bool skip_stack_frame_allocation_p;
 };
 
 /* Enumeration for masked vectored (VI) and non-masked (EIC) interrupts.  */
@@ -472,6 +476,15 @@ struct GTY(())  machine_function {
   /* True if this is an interrupt handler that should use DERET
  instead of ERET.  */
   bool use_debug_exception_return_p;
+
+  /* True if some of the callees uses its frame header.  */
+  bool callees_use_frame_header_p;
+
+  /* True if current function uses its frame header.  */
+  bool uses_frame_header_p;
+
+  /* Frame size before updated by optimizations.  */
+  HOST_WIDE_INT initial_total_size;
 };
 
 /* Information about a single argument.  */
@@ -574,6 +587,8 @@ struct mips_rtx_cost_data
 
 /* Global variables for machine-dependent things.  */
 
+static hash_map *mips_frame_header_usage;
+
 /* The -G setting, or the configuration's default small-data limit if
no -G option is given.  */
 static unsigned int mips_small_data_threshold;
@@ -1296,6 +1311,7 @@ static const struct mips_rtx_cost_data
   }
 };
 
+static void mips_rest_of_frame_header_opt (void);
 static rtx mips_find_pic_call_symbol (rtx_insn *, rtx, bool);
 static int mips_register_move_cost (machine_mode, reg_class_t,
 reg_class_t);
@@ -10358,6 +10374,114 @@ mips_save_reg_p (unsigned int regno)
   return false;
 }
 
+/* Try to find if function may use its incoming frame header.  */
+
+static bool
+mips_find_if_frame_header_is_used (tree fndecl)
+{
+  bool *frame_header_unused;
+
+  if (mips_frame_header_usage)
+frame_header_unused = mips_frame_header_usage->get (fndecl);
+  else
+frame_header_unused = false;
+
+  return !frame_header_unused;
+}
+
+/* Return true i

Re: [Patch, MIPS] MIPS specific optimization for o32 ABI

2015-07-31 Thread Steve Ellcey

On Fri, 2015-07-31 at 00:32 +, Joseph Myers wrote:
> New command-line options need documenting in invoke.texi.

Good point, thanks for catching that.  Here is an updated patch with
invoke.texi.  There are no other changes to the patch.

Steve Ellcey
sell...@imgtec.com



2015-07-31  Steve Ellcey  
Zoran Jovanovic  
Catherine Moore  
Tom de Vries  

* config/mips/mips.opt (mframe-header-opt): New option.
* config/mips/mips.c (struct mips_frame_info): Add
skip_stack_frame_allocation_p field.
(struct machine_function): Add callees_use_frame_header_p,
uses_frame_header_p, and initial_total_size fields.
(mips_frame_header_usage): New hash.
(mips_find_if_frame_header_is_used): New Function.
(mips_callee_use_frame_header): New Function.
(mips_callees_use_frame_header_p): New Function.
(mips_cfun_use_frame_header_p): New Function.
(mips_get_updated_offset): New Function.
(mips_skip_stack_frame_alloc): New Function.
(mips_frame_header_update_insn): New Function.
(mips_rest_of_frame_header_opt): New function.
(mips_compute_frame_info): Add recalculate and frame arguments.
(mips_frame_pointer_required): Add new args to
mips_compute_frame_info call.
(mips_initial_elimination_offset): Ditto.
(mips_gp_expand_needed_p): New function factored out of
mips_expand_ghost_gp_insns.
(mips_expand_ghost_gp_insns): Use mips_gp_expand_needed_p.
(mips_reorg): Use mips_rest_of_frame_header_opt.
* doc/invoke.texi (MIPS Options): Document -mframe-header-opt flag.


2015-07-31  Steve Ellcey  
Tom de Vries  

* gcc.target/mips/fho-1.c: New test.
* gcc.target/mips/fho-2.c: New test.
* gcc.target/mips/mips.exp: Add -mframe-header-opt to
mips_option_groups.





diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index c3cd52d..7cdef89 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -77,6 +77,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "builtins.h"
 #include "rtl-iter.h"
+#include "dumpfile.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -380,6 +381,9 @@ struct GTY(())  mips_frame_info {
 
   /* The offset of hard_frame_pointer_rtx from the bottom of the frame.  */
   HOST_WIDE_INT hard_frame_pointer_offset;
+
+  /* Skip stack frame allocation if possible.  */
+  bool skip_stack_frame_allocation_p;
 };
 
 /* Enumeration for masked vectored (VI) and non-masked (EIC) interrupts.  */
@@ -472,6 +476,15 @@ struct GTY(())  machine_function {
   /* True if this is an interrupt handler that should use DERET
  instead of ERET.  */
   bool use_debug_exception_return_p;
+
+  /* True if some of the callees uses its frame header.  */
+  bool callees_use_frame_header_p;
+
+  /* True if current function uses its frame header.  */
+  bool uses_frame_header_p;
+
+  /* Frame size before updated by optimizations.  */
+  HOST_WIDE_INT initial_total_size;
 };
 
 /* Information about a single argument.  */
@@ -574,6 +587,8 @@ struct mips_rtx_cost_data
 
 /* Global variables for machine-dependent things.  */
 
+static hash_map *mips_frame_header_usage;
+
 /* The -G setting, or the configuration's default small-data limit if
no -G option is given.  */
 static unsigned int mips_small_data_threshold;
@@ -1296,6 +1311,7 @@ static const struct mips_rtx_cost_data
   }
 };
 
+static void mips_rest_of_frame_header_opt (void);
 static rtx mips_find_pic_call_symbol (rtx_insn *, rtx, bool);
 static int mips_register_move_cost (machine_mode, reg_class_t,
 reg_class_t);
@@ -10358,6 +10374,114 @@ mips_save_reg_p (unsigned int regno)
   return false;
 }
 
+/* Try to find if function may use its incoming frame header.  */
+
+static bool
+mips_find_if_frame_header_is_used (tree fndecl)
+{
+  bool *frame_header_unused;
+
+  if (mips_frame_header_usage)
+frame_header_unused = mips_frame_header_usage->get (fndecl);
+  else
+frame_header_unused = false;
+
+  return !frame_header_unused;
+}
+
+/* Return true if the instruction is a call and the called function may use its
+   incoming frame header.  */
+
+static bool
+mips_callee_use_frame_header (rtx_insn *insn)
+{
+  rtx call_insn;
+  tree fndecl;
+
+  if (insn == NULL_RTX || !USEFUL_INSN_P (insn))
+return false;
+
+  /* Handle sequence of instructions.  */
+  if (GET_CODE (PATTERN (insn)) == SEQUENCE)
+{
+  rtx_insn *subinsn;
+  FOR_EACH_SUBINSN (subinsn, insn)
+	if (INSN_P (subinsn) && mips_callee_use_frame_header (subinsn))
+	  return true;
+}
+
+  if (GET_CODE (insn) != CALL_INSN)
+return false;
+
+  if (GET_CODE (PATTERN (insn)) != PARALLEL
+  || GET_CODE (XVECEXP (PATTERN (insn), 0, 0)) != SET)
+return true;
+
+  call_insn = SET_SRC (X

Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-15 Thread Steve Ellcey

On Mon, 2015-09-14 at 09:50 +0200, Bernd Schmidt wrote:
> On 09/13/2015 08:24 PM, Mark Wielaard wrote:
> > commit 97505bd0e4ac15d86c2a302cfebc5f1a4fc2c2e8
> > Author: Mark Wielaard
> > Date:   Fri Sep 11 23:54:15 2015 +0200
> >
> >  PR28901 -Wunused-variable ignores unused const initialised variables 
> > in C
> 
> This is ok.
> 
> 
> Bernd

I am not sure I like this change.  It broke the GLIBC build for me on
MIPS.  Basically GLIBC has a header file with initialized static
constant globals (sysdeps/ieee754/dbl-64/atnat2.h contains tqpi1 and
qpi1) and that header file is included in multiple .c files like
sysdeps/ieee754/dbl-64/e_atan2.c that use some, but not all, of those
static constant variables.  But between the various .c files all of the
globals are used somewhere, just not in every individual .c file.  This
seems like a perfectly reasonable use of static globals and header files
that should not be identified as a warning.

This warning causes the GLIBC build to fail because GLIBC is compiled
with -Wall -Werror.

Steve Ellcey
sell...@imgtec.com

% cat a.h
static const int a = 3;
static const int b = 5;
static const int c = 4;

% cat a.c
#include "a.h"

int foo() { return a + b;}

% gcc -O2 -Werror -Wall -c a.c
In file included from a.c:1:0:
a.h:3:18: error: 'c' defined but not used
[-Werror=unused-const-variable]
 static const int c = 4;
  ^
cc1: all warnings being treated as errors

Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-15 Thread Steve Ellcey

On Tue, 2015-09-15 at 19:10 +0200, Jakub Jelinek wrote:
> On Tue, Sep 15, 2015 at 10:02:15AM -0700, Steve Ellcey wrote:
> > I am not sure I like this change.  It broke the GLIBC build for me on
> > MIPS.  Basically GLIBC has a header file with initialized static
> > constant globals (sysdeps/ieee754/dbl-64/atnat2.h contains tqpi1 and
> > qpi1) and that header file is included in multiple .c files like
> 
> Multiple?  All I can see is e_atan2.c including that header file, nothing
> else.

Whoops, bad assumption on my part.  I thought it must be included
somewhere else, otherwise why put it in a header.

> > sysdeps/ieee754/dbl-64/e_atan2.c that use some, but not all, of those
> > static constant variables.  But between the various .c files all of the
> > globals are used somewhere, just not in every individual .c file.  This
> > seems like a perfectly reasonable use of static globals and header files
> > that should not be identified as a warning.
> 
> I disagree.  While const vars are special in C++, it is really like
> any other variable in C, so the warning is IMHO appropriate.
> 
>   Jakub

I guess it is not the 'const' I think should be handled special but the
'static'.  Having unused static variables (const or not) declared in a
header file but unused seems reasonable since the header file may be
included in multiple .c files each of which uses a subset of the static
variables.

Steve Ellcey
sell...@imgtec.com

Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-21 Thread Steve Ellcey

On Fri, 2015-09-18 at 20:29 -0600, Martin Sebor wrote:

> On 09/15/2015 11:20 AM, Steve Ellcey wrote:

> > I guess it is not the 'const' I think should be handled special but the
> > 'static'.  Having unused static variables (const or not) declared in a
> > header file but unused seems reasonable since the header file may be
> > included in multiple .c files each of which uses a subset of the static
> > variables.
> 
> I tend to agree. I suppose diagnosing unused non-const static
> definitions might be helpful but I can't think of a good reason
> to diagnose unused initialized static consts in C. Especially
> since they're not diagnosed in C++.
> 
> Would diagnosing them in source files while avoiding the warning
> for static const definitions in headers be an acceptable compromise?
> 
> Martin

That seems like a reasonable compromise to me.

Steve Ellcey
sell...@imgtec.com

Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-24 Thread Steve Ellcey

On Thu, 2015-09-24 at 13:56 +0200, Bernd Schmidt wrote:

> I think at this point we have reports of just two packages generating 
> extra warnings, with the warnings at least justifiable in both cases. So 
> my vote would be to leave things as-is for now and see if more reports 
> come in. It is after all expected that a new warning option generates 
> new warnings.
> 
> 
> Bernd

At least one of the warnings in glibc is not justified (in my opinion).
The header file timezone/private.h defines time_t_min and time_t_max.
These are not used in any of the timezone files built by glibc but if
you look at the complete tz package they are used when building other
objects that are not part of the glibc tz component and that include
private.h.

I would make two arguments about why I don't think we should warn.

One is that 'static int const foo = 1' seems a lot like '#define foo 1'
and we don't complain about the macro foo not being used.  If we
complain about the unused const, why not complain about the unused
macro?  We don't complain because we know it would result in too many
warnings in existing code.  If we want people to move away from macros,
and I think we do, then we should not make it harder to do so by
introducing new warnings when they change.

The other is that C++ does not complain about this.  I know that C and
C++ are different languages with different rules but it seems like this
difference is a difference that doesn't have to exist.  Either both
should complain or neither should complain.  I can't think of any valid
reason for one to complain and the other not to.

I think using the used attribute is probably a reasonable way to address
this issue if we continue to generate the warning but I still feel it is
a bad warning in that it will (sometimes) warn about a coding style that
seems perfectly reasonable to me.

Steve Ellcey
sell...@imgtec.com

RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-02 Thread Steve Ellcey

I have spent some time trying to do dynamic stack alignment on MIPS and had
considerable trouble.  The problems are mainly due to the dwarf based stack
unwinding and setjmp/longjmp usages where the code does not go through the
normal function prologue and epilogue code.

Since the only need I have for a dynamically aligned stack is to ensure
that MSA registers (128 bits long, wanting 128 bit alignment) get spilled
to an aligned address on an ABI where the stack is only guaranteed to be
64 bit aligned, I decided to try an approach based on how local variables
are aligned beyond the supported stack alignment.  I allocate the space with
padding and then create an aligned pointer into that space.  Since I want
to do this in lra_spill and I can't allocate a new register to use at
that point what I do is create an 'aligned spill base register' based
on the frame_register in expand_prologue, and then in assign_mem_slot,
if I want an alignment greater than what the stack supports, I allocate
extra space and change the MEM reference for the spill from 'frame_register
+ offset' to 'aligned_spill_base_register + offset'.

The main advantage to this approach over dynamically aligning the stack
is that by not changing the real stack (or frame) pointer there is
minimal chance of breaking the ABI and there are no changes needed to
the dwarf unwind code.  The main disadvantage is that I am padding each
individual spill so I am wasting more space than absolutely required.
It should be possible to address this by putting all the aligned spills
together and sharing the padding but I would like to leave that for a
future improvement.  

In the mean time I would like to get some comments on this approach and
see what people think.  Does this seem like a reasonable approach to
allowing for aligned spills beyond what the stack supports?

I have tested this on MIPS, forcing 64 bit registers to be aligned,
even though the stack supports this natively as a way to test the code
before MSA is checked in and it seems to be working with no regressions.
I do get scan failures in the gcc.target/mips section because I am forcing
GCC to setup $16 as the 'aligned spill base register' in all routines
as a stress test but I do not get any new execution failures.

Steve Ellcey
sell...@imgtec.com


2015-10-02  Steve Ellcey  

* config/mips/mips.c (mips_attribute_table): Add align_spills
and no_align_spills attribute entries.
(mips_cfun_has_msa_p): New function.
(setup_align_basereg_p): Ditto.
(mips_align_spill_p): Ditto..
(mips_align_spill_base_regnum): Ditto..
(mips_align_spill_maximum): Ditto..
(mips_expand_prologue): Setup aligned spill register when needed.
(mips_conditional_register_usage): Reserve aligned spill register when
needed.
(TARGET_ALIGN_SPILL_P): Set to mips_align_spill_p.
(TARGET_ALIGN_SPILL_BASE_REGNUM): Set to mips_align_spill_base_regnum.
(TARGET_ALIGN_SPILL_MAXIMUM): Set to mips_align_spill_maximum.
* config/mips/mips.opt (malign-spills): New option.
* df-scan.c (df_get_regular_block_artificial_uses): Check for usage
of aligned spill register.
(df_get_eh_block_artificial_uses): Ditto.
* hooks.c (hook_uint_void_invalid_regnum): New function.
* hooks.h (hook_uint_void_invalid_regnum): Ditto.
* lra-spills.c (align_slot_address): New function.
(assign_mem_slot): Align slot address if necessary.
* target.def (align_spill_p): New hook.
(align_spill_base_regnum): New hook.
(align_spill_maximum): New hook.
* tm.texi.in (TARGET_ALIGN_SPILL_P): New hook.
(TARGET_ALIGN_SPILL_BASE_REGNUM): Ditto.
(TARGET_ALIGN_SPILL_MAXIMUM): Ditto.
* tm.texi: Regenerate.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 0e0ecf2..84c3724 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -773,6 +773,8 @@ static const struct attribute_spec mips_attribute_table[] = 
{
 mips_handle_use_shadow_register_set_attr, false },
   { "keep_interrupts_masked",  0, 0, false, true,  true, NULL, false },
   { "use_debug_exception_return", 0, 0, false, true,  true, NULL, false },
+  { "align_spills", 0, 0, true, false, false, NULL, false },
+  { "no_align_spills", 0, 0, true, false, false, NULL, false },
   { NULL, 0, 0, false, false, false, NULL, false }
 };
 
@@ -1607,6 +1609,49 @@ mips_merge_decl_attributes (tree olddecl, tree newdecl)
   DECL_ATTRIBUTES (newdecl));
 }
 
+/* Determine if a function may use MSA registers and thus need
+   aligned spills.  */
+
+static bool
+mips_cfun_has_msa_p (void)
+{
+  /* For now, for testing, assume all functions use MSA
+ (and thus may need aligned spills ).  */
+#if 0
+  if (!cfun || !TARGET_MSA)
+return FALSE;
+
+  for (insn = get_insns ();

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-05 Thread Steve Ellcey

On Mon, 2015-10-05 at 10:41 +0200, Bernd Schmidt wrote:
> On 10/02/2015 10:57 PM, Steve Ellcey wrote:
> > I have spent some time trying to do dynamic stack alignment on MIPS and had
> > considerable trouble.  The problems are mainly due to the dwarf based stack
> > unwinding and setjmp/longjmp usages where the code does not go through the
> > normal function prologue and epilogue code.
> [...]
> > The main advantage to this approach over dynamically aligning the stack
> > is that by not changing the real stack (or frame) pointer there is
> > minimal chance of breaking the ABI and there are no changes needed to
> > the dwarf unwind code.  The main disadvantage is that I am padding each
> > individual spill so I am wasting more space than absolutely required.
> > It should be possible to address this by putting all the aligned spills
> > together and sharing the padding but I would like to leave that for a
> > future improvement.
> >
> > In the mean time I would like to get some comments on this approach and
> > see what people think.  Does this seem like a reasonable approach to
> > allowing for aligned spills beyond what the stack supports?
> 
> Personally I'm not a fan. Your description of it makes it sound 
> immensely wasteful, and I'm really not clear on why stack alignment 
> wouldn't work for MIPS when it's been shown to work elsewhere. I think 
> we'd want to see a clear demonstration of unfixable problems with stack 
> alignment before allowing something like this in.
> 
> Vlad would have to comment on the LRA bits, probably.
> 
> 
> Bernd

There probably is some way to get dynamic stack alignment to work on
MIPS, but I am not sure I can do it.  The only platform that I see that
uses dynamic stack alignment is x86.  I think the difficulties in
getting this to work correctly is why no other platform has implemented
it.  The most common response I have gotten when asking around for help
on dynamic stack alignment is usually "just break the ABI".

My approach does waste some space, on MIPS I would be allocating 32
bytes of stack space to spill a 16 byte MSA register, but the
hope/belief is that MSA registers would not get spilled very often.

Steve Ellcey

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-05 Thread Steve Ellcey

On Mon, 2015-10-05 at 09:21 -0700, H.J. Lu wrote:
> On Mon, Oct 5, 2015 at 9:10 AM, Steve Ellcey  wrote:

> > There probably is some way to get dynamic stack alignment to work on
> > MIPS, but I am not sure I can do it.  The only platform that I see that
> > uses dynamic stack alignment is x86.  I think the difficulties in
> > getting this to work correctly is why no other platform has implemented
> > it.  The most common response I have gotten when asking around for help
> > on dynamic stack alignment is usually "just break the ABI".
> >
> > My approach does waste some space, on MIPS I would be allocating 32
> > bytes of stack space to spill a 16 byte MSA register, but the
> > hope/belief is that MSA registers would not get spilled very often.
> 
> We keep track stack frame information precisely in x86 backend,
> including unwind info, and we fixed DWARF unwind info generation to
> support it.  We used gcc_assert to verify that everything is in
> sync.
> 
> I don't know what is missing to support MIPS dynamic stack alignment.
> Unless DWARF unwind info isn't sufficient for MIPS, I don't see why
> dynamic stack alignment can't be done for MIPS.  If you get the wrong
> DWARF unwind info, you can add assert to GCC source to track down
> its origin and fix assert to generate the correct DWARF unwind info.

The problem is that I don't know what is missing either.  I don't know
what the 'correct' stack frame information should look like so I don't
really know what I am trying to generate.  readelf cannot decode the
unwind section of MIPS objects so I can't look at things that way and I
have been trying to work based on what .cfi directives I think I should
be generating and that has not been going well.

One example of an issue I have run into is with the DWARF unwind
generation and 'Rule 16' in dwarf2cfi.c.  It assumes the AND instruction
has an integer constant argument but MIPS can't do an AND with a
constant like -16 so it has to put it in a register first and do the AND
with the register.  I just hacked that code as a temporary workaround
but its the sort of x86 assumption that I have run into due to the fact
that no platform other than x86 currently does dynamic stack alignment.

Steve Ellcey
sell...@imgtec.com

RE: [PATCH, MIPS] Frame header optimization for MIPS O32 ABI

2015-10-05 Thread Steve Ellcey

On Mon, 2015-09-28 at 22:10 +, Moore, Catherine wrote:

> Hi Steve, I'm sorry for the delay in reviewing this patch. 
> Some changes have been committed upstream (see revision #227941) that will
> require updates to this patch.
> Please post the update for review.  Other comments are embedded.

OK, I have updated the comments based on your input and changed the code
to compile with the ToT GCC after revision @227941.  Here is the new
patch.


2015-10-05  Steve Ellcey  

* config.gcc (mips*-*-*): Add frame-header-opt.o to extra_objs.
* frame-header-opt.c: New file.
* config/mips/mips-proto.h (mips_register_frame_header_opt):
Add prototype.
* config/mips/mips.c (mips_compute_frame_info): Check
optimize_call_stack flag.
(mips_option_override): Register new frame_header_opt pass.
(mips_frame_info, mips_int_mask, mips_shadow_set,
machine_function): Move these types to...
* config/mips/mips.h: here.
(machine_function): Add does_not_use_frame_header and
optimize_call_stack fields.
* config/mips/t-mips (frame-header-opt.o): Add new make rule.
* doc/invoke.texi (-mframe-header-opt, -mno-frame-header-opt):
Document new flags.
* config/mips/mips.opt (mframe-header-opt): Add new option.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 56797bd..7bf66f8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -420,6 +420,7 @@ microblaze*-*-*)
 mips*-*-*)
cpu_type=mips
extra_headers="loongson.h"
+   extra_objs="frame-header-opt.o"
extra_options="${extra_options} g.opt fused-madd.opt 
mips/mips-tables.opt"
;;
 nds32*)
diff --git a/gcc/config/mips/frame-header-opt.c 
b/gcc/config/mips/frame-header-opt.c
new file mode 100644
index 000..7c7b1f2
--- /dev/null
+++ b/gcc/config/mips/frame-header-opt.c
@@ -0,0 +1,216 @@
+/* Analyze functions to determine if callers need to allocate a frame header
+   on the stack.  The frame header is used by callees to save their arguments.
+   This optimization is specific to TARGET_OLDABI targets.  For TARGET_NEWABI
+   targets, if a frame header is required, it is allocated by the callee.
+
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "context.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "tree-core.h"
+#include "tree-pass.h"
+#include "target.h"
+#include "target-globals.h"
+#include "cfg.h"
+#include "cgraph.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+
+static unsigned int frame_header_opt (void);
+
+namespace {
+
+const pass_data pass_data_ipa_frame_header_opt =
+{
+  IPA_PASS, /* type */
+  "frame-header-opt", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_CGRAPHOPT, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_ipa_frame_header_opt : public ipa_opt_pass_d
+{
+public:
+  pass_ipa_frame_header_opt (gcc::context *ctxt)
+: ipa_opt_pass_d (pass_data_ipa_frame_header_opt, ctxt,
+  NULL, /* generate_summary */
+  NULL, /* write_summary */
+  NULL, /* read_summary */
+  NULL, /* write_optimization_summary */
+  NULL, /* read_optimization_summary */
+  NULL, /* stmt_fixup */
+  0, /* function_transform_todo_flags_start */
+  NULL, /* function_transform */
+  NULL) /* variable_transform */
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+{
+  /* This optimization has no affect if TARGET_NEWABI.   If optimize
+ is not at least 1 then the data needed for the optimization is
+ not available and nothing will be done anyway.  */
+  return TARGET_OLDABI && flag_frame_header_optimization;
+}
+
+  virtual unsigned int execute (funct

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-06 Thread Steve Ellcey

On Mon, 2015-10-05 at 09:57 -0700, H.J. Lu wrote:

> You need to update dwarf2cfi.c to generate proper unwind info for
> whatever frame instructions MIPS generates, like what we did for
> x86 dynamic stack realignment.

The problem is understanding what the 'proper' unwind info is and
figuring out what is wrong about it when it doesn't work.

I used Bernd's comment about Rule #6 to handle the sp = sp AND reg
issue, but I have a couple of more dwarf/cfi questions.

One, can you explicitly make a copy of the stack pointer to another
register and not make that register the new stack pointer?  I want to
save the old stack pointer before aligning it but when I do that I think
that dwarfcfi.c automatically assumes that the new register is now the
stack pointer and that is not what I want.  I want the stack pointer to
remain as the original register.

My other question is about 'set_unexpected' and how that affects
the generated unwind info.  I noticed that a lot of my failing tests use
'set_unexpected' and I don't know what this function does or how it
affects the generated code that would cause these tests in particular to
fail.

Steve Ellcey
sell...@imgtec.com

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-06 Thread Steve Ellcey

On Tue, 2015-10-06 at 17:39 +0200, Bernd Schmidt wrote:

> 
> Did your tag that copy as RTX_FRAME_RELATED? I'd expect dwarf2cfi would 
> ignore instructions with that bit unset. There's even a comment 
> indicating arm does something like this:

Yes, I had marked it as RTX_FRAME_RELATED.  When I took that out things
looked better (well, maybe just different).

If I remove that and I change Rule #16 to handle an AND with a register
I get odd looking .cfi stuff.  The AND instruction (which is marked with
RTX_FRAME_RELATED) seems to generate these cfi_escape macros:

.cfi_escape 0x10,0x1f,0x2,0x8e,0x7c
.cfi_escape 0x10,0x1e,0x2,0x8e,0x78
.cfi_escape 0x10,0xc,0x2,0x8e,0x74

which are meaningless to me.  What I found works better is to skip the
dwarf2cfi.c changes and explicitly attach this note to the AND
instruction:

  cfi_note = alloc_reg_note (REG_CFA_DEF_CFA, stack_pointer_rtx, NULL_RTX);
  REG_NOTES (insn) = cfi_note;

When I do this the only unexpected test suite execution failures I see
are ones that call the C++ set_unexpected function.

FYI: The reason I was copying the stack pointer to another register was
to use that other register as my argument pointer (needed after the
stack pointer got aligned) and to use it to restore the stack pointer at
the end of the function for normal returns.

Steve Ellcey
sell...@imgtec.com

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-06 Thread Steve Ellcey

On Tue, 2015-10-06 at 11:10 -0700, H.J. Lu wrote:

> Does it pass all tests under g++.dg/torture/stackalign?  You need
> to implement -mstackrealign and -mpreferred-stack-boundary=
> as well as update check_effective_target_automatic_stack_alignment
> to run all stack alignment tests.

No, those tests were not run.  I have not implemented the -mstackrealign
or -mpreferred-stack-boundary options.  I do not think those are the
options I want for stack realignment on MIPS.  The only reason
for implementing stack realignment on MIPS is to align MSA (vector)
register spills.  Yes, stack realignment can also be used with aligned
local variables but those could also be handled with the existing alloca
method without implementing stack realignment.

So I want an option (turned on by default when using -mmsa) that only
aligns a function if that function contains MSA register usage.  I don't
want an option to align every function because that would slow things
down for no reason.  I should and will try these test cases with my
option.  Right now my testing forces all functions to be aligned even if
they don't use MSA because that gives me maximum testing, but the final
code will only align functions that use MSA.

Steve Ellcey
sell...@imgtec.com

RE: [PATCH, MIPS] Frame header optimization for MIPS O32 ABI

2015-10-06 Thread Steve Ellcey

On Tue, 2015-10-06 at 12:02 +, Moore, Catherine wrote:

> > Moore, Catherine  writes:
> > > The patch itself looks good, but the tests that you added need a little 
> > > more
> > work.

Here is a new patch for just the tests.  I added NOMIPS16 and the
-mabi=32 flag as Matthew suggested and I also added -mno-abicalls.

Without the -mno-abicalls the MIPS linux compiler defaulted to
-mabicalls and allocated 32 bytes without the optimization and the MIPS
elf compiler defaulted to -mno-abicalls and allocated 24 bytes without
the optimization.  This synchronizes them so they both allocate 24 bytes
without the optimization and 8 bytes with it.  Everything should pass
now, it passed for me using mips-mti-linux-gnu and mips-mti-elf.

Steve Ellcey
sell...@imgtec.com


2015-10-06  Steve Ellcey  

* gcc.target/mips/mips.exp (mips_option_groups): Add -mframe-header-opt
and -mno-frame-header-opt options.
* gcc.target/mips/frame-header-1.c: New file.
* gcc.target/mips/frame-header-2.c: New file.
* gcc.target/mips/frame-header-3.c: New file.


diff --git a/gcc/testsuite/gcc.target/mips/frame-header-1.c 
b/gcc/testsuite/gcc.target/mips/frame-header-1.c
new file mode 100644
index 000..971656d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/frame-header-1.c
@@ -0,0 +1,21 @@
+/* Verify that we do not optimize away the frame header in foo when using
+   -mno-frame-header-opt by checking the stack pointer increment done in
+   that function.  Without the optimization foo should increment the stack
+   by 24 bytes, with the optimization it would only be 8 bytes.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mno-frame-header-opt -mabi=32 -mno-abicalls" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-24" } } */
+
+NOMIPS16 void __attribute__((noinline))
+bar (int* a)
+{
+  *a = 1;
+}
+
+NOMIPS16 void
+foo (int a)
+{
+  bar (&a);
+}
diff --git a/gcc/testsuite/gcc.target/mips/frame-header-2.c 
b/gcc/testsuite/gcc.target/mips/frame-header-2.c
new file mode 100644
index 000..0e86bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/frame-header-2.c
@@ -0,0 +1,21 @@
+/* Verify that we do optimize away the frame header in foo when using
+   -mframe-header-opt by checking the stack pointer increment done in
+   that function.  Without the optimization foo should increment the
+   stack by 24 bytes, with the optimization it would only be 8 bytes.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mframe-header-opt -mabi=32 -mno-abicalls" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-8" } } */
+
+NOMIPS16 void __attribute__((noinline))
+bar (int* a)
+{
+  *a = 1;
+}
+
+NOMIPS16 void
+foo (int a)
+{
+  bar (&a);
+}
diff --git a/gcc/testsuite/gcc.target/mips/frame-header-3.c 
b/gcc/testsuite/gcc.target/mips/frame-header-3.c
new file mode 100644
index 000..2a8c515
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/frame-header-3.c
@@ -0,0 +1,22 @@
+/* Verify that we do not optimize away the frame header in foo when using
+   -mframe-header-opt but are calling a weak function that may be overridden
+   by a different function that does need the frame header.  Without the
+   optimization foo should increment the stack by 24 bytes, with the
+   optimization it would only be 8 bytes.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mframe-header-opt -mabi=32 -mno-abicalls" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-24" } } */
+
+NOMIPS16 void __attribute__((noinline, weak))
+bar (int* a)
+{
+  *a = 1;
+}
+
+void
+NOMIPS16 foo (int a)
+{
+  bar (&a);
+}
diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
b/gcc/testsuite/gcc.target/mips/mips.exp
index 42e7fff..0f2d6a2 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -256,6 +256,7 @@ set mips_option_groups {
 maddps "HAS_MADDPS"
 lsa "(|!)HAS_LSA"
 section_start "-Wl,--section-start=.*"
+frame-header "-mframe-header-opt|-mno-frame-header-opt"
 }
 
 for { set option 0 } { $option < 32 } { incr option } {

Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-10-06 Thread Steve Ellcey

On Thu, 2015-09-24 at 21:24 -0400, Trevor Saunders wrote:
> On Thu, Sep 24, 2015 at 06:55:11PM +0200, Bernd Schmidt wrote:
> > On 09/24/2015 06:11 PM, Steve Ellcey wrote:
> > >At least one of the warnings in glibc is not justified (in my opinion).
> > >The header file timezone/private.h defines time_t_min and time_t_max.
> > >These are not used in any of the timezone files built by glibc but if
> > >you look at the complete tz package they are used when building other
> > >objects that are not part of the glibc tz component and that include
> > >private.h.
> > 
> > The standard C way of writing this would be to declare time_t_min in the
> > header and have its definition in another file, or use a TIME_T_MIN macro as
> > glibc does in mktime.c. That file even has a local redefinition:
> >   time_t time_t_min = TIME_T_MIN;
> > So at the very least the warning points at code that has some oddities.
> 
> I can believe its an odd way to write C, but is it actually a bad one?
> I expect if I got warnings for code like that I'd be pretty unhappy
> about either moving the constant out where the compiler can't always see
> it, or making it a macro.
> 
> > >I would make two arguments about why I don't think we should warn.
> > >
> > >One is that 'static int const foo = 1' seems a lot like '#define foo 1'
> > >and we don't complain about the macro foo not being used.  If we
> > >complain about the unused const, why not complain about the unused
> > >macro?  We don't complain because we know it would result in too many
> > >warnings in existing code.  If we want people to move away from macros,
> > >and I think we do, then we should not make it harder to do so by
> > >introducing new warnings when they change.
> > >
> > >The other is that C++ does not complain about this.  I know that C and
> > >C++ are different languages with different rules but it seems like this
> > >difference is a difference that doesn't have to exist.  Either both
> > >should complain or neither should complain.  I can't think of any valid
> > >reason for one to complain and the other not to.
> > 
> > Well, they _are_ different languages, and handling of const is one place
> > where they differ. For example, C++ consts can be used in places where
> > constant expressions are required. The following is a valid C++ program but
> > not a C program:
> > 
> > const int v = 200;
> > int t[v];
> > 
> > The result is that the typical programming style for C is to have constants
> > #defined, while for C++ you can find more examples like the above; I recall
> > Stroustrup explicitly advocating that in the introductory books I read 20
> > years ago, and using it as a selling point for C++. Existing practice is
> > important when deciding what to warn about, and for the moment I remain
> > convinced that C practice is sufficiently different from C++.
> 
> existing practice is certainly important, but I would say that what is
> good practice is also very important.  It seems to me that warning for
> these constants is basically making it hard to follow a better practice
> than the existing one.  That seems pretty unfortunate.
> 
> On the other hand I've become much more of a C++ programmer than a C one
> so, I'm probably not the best judge.
> 
> Trev
> 
> > 
> > 
> > Bernd

So, is there any consensus on this issue?  I cannot build top-of-tree
glibc with top-of-tree GCC due to this warning coming from
timezone/private.h.  If GCC is going to keep this warning then I should
talk to the owners of the header file (tz group, not glibc) and see if
they would be willing to add a unused attribute to it.  Personally, I
would rather have GCC not issue the warning, but that is just based on
my opinion that the definition of a (possibly) unused static initialized
variable in a C header is a reasonable practice.

Steve Ellcey
sell...@imgtec.com

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-09 Thread Steve Ellcey

On Tue, 2015-10-06 at 11:10 -0700, H.J. Lu wrote:

> Does it pass all tests under g++.dg/torture/stackalign?  You need
> to implement -mstackrealign and -mpreferred-stack-boundary=
> as well as update check_effective_target_automatic_stack_alignment
> to run all stack alignment tests.

FYI: I was able to run those tests with my stack alignment flag and
without any failures.  I notice that none of them use set_unexpected,
which is where I am having problems on MIPS.  Have you tried running any
of the following tests that use set_unexpected with your stack alignment
flags?  They fail or me on MIPS and I was wondering if they work for you
on x86.

g++.dg/cpp0x/lambda/lambda-eh2.C
g++.dg/eh/unexpected1.C
g++.old-deja/g++.eh/spec2.C
g++.old-deja/g++.eh/spec3.C
g++.old-deja/g++.mike/eh33.C
g++.old-deja/g++.mike/eh50.C
g++.old-deja/g++.mike/eh51.C

Steve Ellcey
sell...@imgtec.com

Re: RFC: Patch to allow spill slot alignment greater than the stack alignment

2015-10-09 Thread Steve Ellcey

On Fri, 2015-10-09 at 15:36 -0700, H.J. Lu wrote:

> 
> I am not sure what you were asking.  I tried:
> 
> make check-g++ RUNTESTFLAGS="--target_board='unix{-m32\
> -mstackrealign}' old-deja.exp=spec*.C"
> ...
> 
> === g++ Summary ===
> 
> # of expected passes 495
> # of expected failures 3
> 
> make check-g++ RUNTESTFLAGS="--target_board='unix{-m32\
> -mstackrealign}' old-deja.exp=eh*.C"
> ...
> === g++ Summary ===
> 
> # of expected passes 372

OK, that was what I was wondering about.  I wasn't sure if you had run
the entire test suite with -mstackrealign or only the tests in 
g++.dg/torture/stackalign.

Steve Ellcey
sell...@imgtec.com

[Patch, MIPS] Frame header optimization for MIPS (part 2)

2015-10-16 Thread Steve Ellcey


Here is the second part of the MIPS frame header optimization patch.  The
first part avoided allocating a frame header if it knew that none of the
functions that it called would need it.  This part uses a frame header
to save callee saved registers if doing so will allow the function to avoid
having to allocate stack space (thus avoiding the need to increment and
decrement the stack pointer at all).

I did a little reorganization of my first patch as part of this, seperating
the is_leaf_function check from needs_frame_header_p and making it a
seperate check from callees_functions_use_frame_header.  This allowed me
to reuse the needs_frame_header_p function and the does_not_use_frame_header
field that stores its value in this new patch.

The optimimization may be more conservative than it has to be in some
cases like functions with floating point arguments but it catches the
most important cases and we can extend it later if needed.

Tested with no regressions using the mips-mti-linux-gnu toolchain.

OK to checkin?

Steve Ellcey
sell...@imgtec.com


2015-10-16  Steve Ellcey  

* frame-header-opt.c (gate): Check for optimize > 0.
(has_inlined_assembly): New function.
(needs_frame_header_p): Remove is_leaf_function check,
add argument type check.
(callees_functions_use_frame_header): Add is_leaf_function
and has_inlined_assembly calls..
(set_callers_may_not_allocate_frame): New function.
(frame_header_opt): Add is_leaf_function call, add
set_callers_may_not_allocate_frame call.
* config/mips/mips.c (mips_compute_frame_info): Add check
to see if callee saved regs can be put in frame header.
(mips_expand_prologue): Add check to see if step1 is zero,
fix cfa restores when using frame header to store regs.
(mips_can_use_return_insn): Check to see if registers are
stored in frame header.
* config/mips/mips.h (machine_function): Add
callers_may_not_allocate_frame and
use_frame_header_for_callee_saved_regs fields.


diff --git a/gcc/config/mips/frame-header-opt.c 
b/gcc/config/mips/frame-header-opt.c
index 7c7b1f2..5512838 100644
--- a/gcc/config/mips/frame-header-opt.c
+++ b/gcc/config/mips/frame-header-opt.c
@@ -79,7 +79,7 @@ public:
   /* This optimization has no affect if TARGET_NEWABI.   If optimize
  is not at least 1 then the data needed for the optimization is
  not available and nothing will be done anyway.  */
-  return TARGET_OLDABI && flag_frame_header_optimization;
+  return TARGET_OLDABI && flag_frame_header_optimization && (optimize > 0);
 }
 
   virtual unsigned int execute (function *) { return frame_header_opt (); }
@@ -125,6 +125,29 @@ is_leaf_function (function *fn)
   return true;
 }
 
+/* Return true if this function has inline assembly code or if we cannot
+   be certain that it does not.  False if know that there is no inline
+   assembly.  */
+
+static bool
+has_inlined_assembly (function *fn)
+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+
+  /* If we do not have a cfg for this function be conservative and assume
+ it is may have inline assembly.  */
+  if (fn->cfg == NULL)
+return true;
+
+  FOR_EACH_BB_FN (bb, fn)
+for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+  if (gimple_code (gsi_stmt (gsi)) == GIMPLE_ASM)
+   return true;
+
+  return false;
+}
+
 /* Return true if this function will use the stack space allocated by its
caller or if we cannot determine for certain that it does not.  */
 
@@ -136,20 +159,26 @@ needs_frame_header_p (function *fn)
   if (fn->decl == NULL)
 return true;
 
-  if (fn->stdarg || !is_leaf_function (fn))
+  if (fn->stdarg)
 return true;
 
   for (t = DECL_ARGUMENTS (fn->decl); t; t = TREE_CHAIN (t))
 {
   if (!use_register_for_decl (t))
- return true;
+   return true;
+
+  /* Some 64 bit types may get copied to general registers using the frame
+header, see mips_output_64bit_xfer.  Checking for SImode only may be
+ overly restrictive but it is gauranteed to be safe. */
+  if (DECL_MODE (t) != SImode)
+   return true;
 }
 
   return false;
 }
 
-/* Returns TRUE if the argument stack space allocated by function FN is used.
-   Returns FALSE if the space is needed or if the need for the space cannot
+/* Return true if the argument stack space allocated by function FN is used.
+   Return false if the space is needed or if the need for the space cannot
be determined.  */
 
 static bool
@@ -177,6 +206,8 @@ callees_functions_use_frame_header (function *fn)
  called_fn = DECL_STRUCT_FUNCTION (called_fn_tree);
  if (called_fn == NULL
  || DECL_WEAK (called_fn_tree) 
+ || has_inlined_assembly (called_fn)
+ || !is_leaf_function (called

[Patch, MIPS] Patch to fix MIPS optimization bug in combine

2015-10-21 Thread Steve Ellcey


A bug was reported against the GCC MIPS64 compiler that involves a bad combine
and this patch fixes the bug.

When using '-fexpensive-optimizations -march=mips64r2 -mabi=64' GCC is
combining these instructions:

(insn 13 12 14 2 (set (reg:DI 206 [ *last_3(D)+-4 ])
(zero_extend:DI (subreg/s/u:SI (reg:DI 196 [ *last_3(D)+-4 ]) 4))) 
x.c:21 212 {*zero_extendsidi2_dext}
 (nil))

(insn 15 14 16 2 (set (reg:DI 208)
(and:DI (reg:DI 207)
(const_int 4294967295 [0x]))) x.c:21 182 {*anddi3}
 (expr_list:REG_DEAD (reg:DI 207)
(nil)))

(jump_insn 16 15 17 2 (set (pc)
(if_then_else (ne (reg:DI 206 [ *last_3(D)+-4 ])
(reg:DI 208))
(label_ref:DI 35)
(pc))) x.c:21 473 {*branch_equalitydi}
 (expr_list:REG_DEAD (reg:DI 208)
(expr_list:REG_DEAD (reg:DI 206 [ *last_3(D)+-4 ])
(int_list:REG_BR_PROB 8010 (nil
 -> 35)

Into:

(jump_insn 16 15 17 2 (set (pc)
(if_then_else (ne (subreg:SI (reg:DI 207) 4)
(subreg:SI (reg:DI 196 [ *last_3(D)+-4 ]) 4))
(label_ref:DI 35)
(pc))) x.c:21 472 {*branch_equalitysi}
 (expr_list:REG_DEAD (reg:DI 207)
(int_list:REG_BR_PROB 8010 (nil)))
 -> 35)


The problem is that the jump_insn on MIPS64 generates a beq/bne instruction
that compares the entire 64 bit registers and there is no option to only
look at the bottom 32 bits.  When we got rid of insn 15 we lost the AND that
cleared the upper 32 bits of one of the registers and the program fails.

My solution was to not allow subregs in the conditional jump instruction.
Here is the patch and a test case and I ran the GCC testsuite with no
regressions.

OK to checkin?

Steve Ellcey
sell...@imgtec.com


2015-10-21  Steve Ellcey  

* mips.c (mips_legitimate_combined_insn): New function.
(TARGET_LEGITIMATE_COMBINED_INSN): New macro.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 0b4a5fa..4338628 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -19606,6 +19606,26 @@ mips_ira_change_pseudo_allocno_class (int regno, 
reg_class_t allocno_class)
 return GR_REGS;
   return allocno_class;
 }
+
+/* Implement TARGET_LEGITIMATE_COMBINED_INSN */
+
+static bool
+mips_legitimate_combined_insn (rtx_insn* insn)
+{
+  /* If we do a conditional jump involving register compares do not allow
+ subregs because beq/bne will always compare the entire register.
+ This should only be an issue with N32/N64 ABI's that do a 32 bit
+ compare and jump.  */
+
+  if (any_condjump_p (insn))
+{
+  rtx cond = XEXP (SET_SRC (pc_set (insn)), 0);
+  if (GET_RTX_CLASS (GET_CODE (cond)) == RTX_COMPARE
+ || GET_RTX_CLASS (GET_CODE (cond)) == RTX_COMM_COMPARE)
+   return !SUBREG_P (XEXP (cond, 0)) && !SUBREG_P (XEXP (cond, 1));
+}
+  return true;
+}
 
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
@@ -19868,6 +19888,9 @@ mips_ira_change_pseudo_allocno_class (int regno, 
reg_class_t allocno_class)
 #undef TARGET_HARD_REGNO_SCRATCH_OK
 #define TARGET_HARD_REGNO_SCRATCH_OK mips_hard_regno_scratch_ok
 
+#undef TARGET_LEGITIMATE_COMBINED_INSN
+#define TARGET_LEGITIMATE_COMBINED_INSN mips_legitimate_combined_insn
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-mips.h"





2015-10-21  Steve Ellcey  

* gcc.dg/combine-subregs.c: New test.


diff --git a/gcc/testsuite/gcc.dg/combine-subregs.c 
b/gcc/testsuite/gcc.dg/combine-subregs.c
index e69de29..c480f88 100644
--- a/gcc/testsuite/gcc.dg/combine-subregs.c
+++ b/gcc/testsuite/gcc.dg/combine-subregs.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fexpensive-optimizations" } */
+
+#include 
+#include 
+
+void __attribute__ ((noinline))
+foo (uint64_t state, uint32_t last)
+{
+  if (state == last) abort ();
+}
+
+/* This function may do a bad comparision by trying to
+   use SUBREGS during the compare on machines where comparing
+   two registers always compares the entire register regardless
+   of mode.  */
+
+int __attribute__ ((noinline))
+compare (uint64_t state, uint32_t *last, uint8_t buf)
+{
+if (*last == ((state | buf) & 0x)) {
+   foo (state, *last);
+return 0;
+}
+return 1;
+}
+
+int
+main(int argc, char **argv) {
+uint64_t state = 0xF0100U;
+uint32_t last  = 0x101U;
+int ret= compare(state, &last, 0x01);
+if (ret != 0)
+   abort ();
+exit (0);
+}

Re: [Patch, MIPS] Frame header optimization for MIPS (part 2)

2015-10-21 Thread Steve Ellcey

On Wed, 2015-10-21 at 20:45 +0200, Bernd Schmidt wrote:
> On 10/21/2015 07:38 PM, Mike Stump wrote:
> > On Oct 20, 2015, at 3:55 PM, Bernd Schmidt  wrote:
> >> There are many unnecessary parentheses in this patch which should be 
> >> removed.
> >
> > I can smell a -Wall patch.  :-)
> 
> Not really, because -Wall isn't in the business of enforcing a coding style.
> 
> 
> Bernd

Hm, how about a separate warning that wasn't part of -Wall but could
still be used by GCC or other products that wanted to enforce a 'no
unnecessary parenthesis' coding style.  Not that I'm volunteering.

Steve Ellcey
sell...@imgtec.com

Re: [Patch, MIPS] Patch to fix MIPS optimization bug in combine

2015-10-21 Thread Steve Ellcey

Andrew,

Here is the new patch that I am currently testing with your change and
incorporating Eric's comment.  I included both test cases but renamed
yours and put it into gcc.dg/torture.  Does the code in combine.c to
address Eric's comment look OK to you?

Steve Ellcey
steve.ell...@imgtec.com


2015-10-21  Steve Ellcey  
Andrew Pinski  

* combine.c (simplify_comparison): Use gen_lowpart_or_truncate instead
of gen_lowpart when we had a truncating and.


diff --git a/gcc/combine.c b/gcc/combine.c
index fd3e19c..7568ebb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11529,8 +11529,8 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx 
*pop1)
 tmode != GET_MODE (op0); tmode = GET_MODE_WIDER_MODE (tmode))
  if ((unsigned HOST_WIDE_INT) c0 == GET_MODE_MASK (tmode))
{
- op0 = gen_lowpart (tmode, inner_op0);
- op1 = gen_lowpart (tmode, inner_op1);
+ op0 = gen_lowpart_or_truncate (tmode, inner_op0);
+ op1 = gen_lowpart_or_truncate (tmode, inner_op1);
  code = unsigned_condition (code);
  changed = 1;
  break;
@@ -12048,12 +12048,9 @@ simplify_comparison (enum rtx_code code, rtx *pop0, 
rtx *pop1)
   & GET_MODE_MASK (mode))
  + 1)) >= 0
  && const_op >> i == 0
- && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode
- && (TRULY_NOOP_TRUNCATION_MODES_P (tmode, GET_MODE (op0))
- || (REG_P (XEXP (op0, 0))
- && reg_truncated_to_mode (tmode, XEXP (op0, 0)
+ && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode)
{
- op0 = gen_lowpart (tmode, XEXP (op0, 0));
+ op0 = gen_lowpart_or_truncate (tmode, XEXP (op0, 0));
  continue;
}
 

2015-10-21  Steve Ellcey  
Andrew Pinski  

* gcc.dg/torture/pr67736.c: New test.
* gcc.dg/combine-subregs.c: New test.

diff --git a/gcc/testsuite/gcc.dg/combine-subregs.c 
b/gcc/testsuite/gcc.dg/combine-subregs.c
index e69de29..c480f88 100644
--- a/gcc/testsuite/gcc.dg/combine-subregs.c
+++ b/gcc/testsuite/gcc.dg/combine-subregs.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fexpensive-optimizations" } */
+
+#include 
+#include 
+
+void __attribute__ ((noinline))
+foo (uint64_t state, uint32_t last)
+{
+  if (state == last) abort ();
+}
+
+/* This function may do a bad comparision by trying to
+   use SUBREGS during the compare on machines where comparing
+   two registers always compares the entire register regardless
+   of mode.  */
+
+int __attribute__ ((noinline))
+compare (uint64_t state, uint32_t *last, uint8_t buf)
+{
+if (*last == ((state | buf) & 0x)) {
+   foo (state, *last);
+return 0;
+}
+return 1;
+}
+
+int
+main(int argc, char **argv) {
+uint64_t state = 0xF0100U;
+uint32_t last  = 0x101U;
+int ret= compare(state, &last, 0x01);
+if (ret != 0)
+   abort ();
+exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr67736.c 
b/gcc/testsuite/gcc.dg/torture/pr67736.c
index e69de29..b45874e 100644
--- a/gcc/testsuite/gcc.dg/torture/pr67736.c
+++ b/gcc/testsuite/gcc.dg/torture/pr67736.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+#include 
+
+typedef unsigned long long uint64_t;
+void f(uint64_t *a, uint64_t aa) __attribute__((noinline));
+void f(uint64_t *a, uint64_t aa)
+{
+  uint64_t new_value = aa;
+  uint64_t old_value = *a;
+  int bit_size = 32;
+uint64_t mask = (uint64_t)(unsigned)(-1);
+uint64_t tmp = old_value & mask;
+new_value &= mask;
+/* On overflow we need to add 1 in the upper bits */
+if (tmp > new_value)
+new_value += 1ull<

[PATCH] Fix PR rtl-optimization/67736 in combine.c

2015-10-22 Thread Steve Ellcey


This is a new patch to fix PR rtl-optimization/67736.  My original patch
was MIPS specific and that was rejected and Andrew Pinski pointed me to
a patch he had sent out several years ago but that was never checked in.
I updated the patch and tested it with no regressions and would like to
check it in.  I addressed Eric Botcazou comments from the original patch
and also included the (updated) test case that Andrew had as well as one
I created.

OK to checkin?

Steve Ellcey
sell...@imgtec.com

My original patch and the follow up discussion is at:

https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02111.html

Andrews original patch is at:

https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00401.html



2015-10-21  Steve Ellcey  
Andrew Pinski  

PR rtl-optimization/67736
* combine.c (simplify_comparison): Use gen_lowpart_or_truncate instead
of gen_lowpart when we had a truncating and.


diff --git a/gcc/combine.c b/gcc/combine.c
index fd3e19c..7568ebb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11529,8 +11529,8 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx 
*pop1)
 tmode != GET_MODE (op0); tmode = GET_MODE_WIDER_MODE (tmode))
  if ((unsigned HOST_WIDE_INT) c0 == GET_MODE_MASK (tmode))
{
- op0 = gen_lowpart (tmode, inner_op0);
- op1 = gen_lowpart (tmode, inner_op1);
+ op0 = gen_lowpart_or_truncate (tmode, inner_op0);
+ op1 = gen_lowpart_or_truncate (tmode, inner_op1);
  code = unsigned_condition (code);
  changed = 1;
  break;
@@ -12048,12 +12048,9 @@ simplify_comparison (enum rtx_code code, rtx *pop0, 
rtx *pop1)
   & GET_MODE_MASK (mode))
  + 1)) >= 0
  && const_op >> i == 0
- && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode
- && (TRULY_NOOP_TRUNCATION_MODES_P (tmode, GET_MODE (op0))
- || (REG_P (XEXP (op0, 0))
- && reg_truncated_to_mode (tmode, XEXP (op0, 0)
+ && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode)
{
- op0 = gen_lowpart (tmode, XEXP (op0, 0));
+ op0 = gen_lowpart_or_truncate (tmode, XEXP (op0, 0));
      continue;
}
 


2015-10-21  Steve Ellcey  
Andrew Pinski  

PR rtl-optimization/67736
* gcc.dg/torture/pr67736.c: New test.
* gcc.dg/combine-subregs.c: New test.


diff --git a/gcc/testsuite/gcc.dg/combine-subregs.c 
b/gcc/testsuite/gcc.dg/combine-subregs.c
index e69de29..c480f88 100644
--- a/gcc/testsuite/gcc.dg/combine-subregs.c
+++ b/gcc/testsuite/gcc.dg/combine-subregs.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fexpensive-optimizations" } */
+
+#include 
+#include 
+
+void __attribute__ ((noinline))
+foo (uint64_t state, uint32_t last)
+{
+  if (state == last) abort ();
+}
+
+/* This function may do a bad comparision by trying to
+   use SUBREGS during the compare on machines where comparing
+   two registers always compares the entire register regardless
+   of mode.  */
+
+int __attribute__ ((noinline))
+compare (uint64_t state, uint32_t *last, uint8_t buf)
+{
+if (*last == ((state | buf) & 0x)) {
+   foo (state, *last);
+return 0;
+}
+return 1;
+}
+
+int
+main(int argc, char **argv) {
+uint64_t state = 0xF0100U;
+uint32_t last  = 0x101U;
+int ret= compare(state, &last, 0x01);
+if (ret != 0)
+   abort ();
+exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr67736.c 
b/gcc/testsuite/gcc.dg/torture/pr67736.c
index e69de29..b45874e 100644
--- a/gcc/testsuite/gcc.dg/torture/pr67736.c
+++ b/gcc/testsuite/gcc.dg/torture/pr67736.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+#include 
+
+typedef unsigned long long uint64_t;
+void f(uint64_t *a, uint64_t aa) __attribute__((noinline));
+void f(uint64_t *a, uint64_t aa)
+{
+  uint64_t new_value = aa;
+  uint64_t old_value = *a;
+  int bit_size = 32;
+uint64_t mask = (uint64_t)(unsigned)(-1);
+uint64_t tmp = old_value & mask;
+new_value &= mask;
+/* On overflow we need to add 1 in the upper bits */
+if (tmp > new_value)
+new_value += 1ull<

Re: [PATCH] unconditionally compile most of the delay slot code

2015-10-22 Thread Steve Ellcey

On Wed, 2015-10-21 at 11:13 -0400, tbsaunde+...@tbsaunde.org wrote:
> From: Trevor Saunders 
> 
> Hi,
> 
> $subject
> 
> bootstrapped+ regtested x86_64-linux-gnu, I wouldn't mind a second pair of 
> eyes
> on this one given its not totally trivial.
> 
> Trev
> 
> gcc/ChangeLog:
> 
> 2015-10-20  Trevor Saunders  
> 
>   * cfgrtl.c (pass_free_cfg::execute): Adjust.
>   * final.c (dbr_sequence_length): Always define.
>   (shorten_branches): Adjust.
>   * genattr-common.c (main): Always define DELAY_SLOTS.
>   * genattr.c (main): Unconditionally declare functions and define
>   macros related to delay slots.
>   * genattrtab.c (write_eligible_delay): Adjust.
>   (main): Always write out delay slot functions.
>   * opts.c (default_options_table): Adjust.
>   * reorg.c (redirect_with_delay_slots_safe_p): Likewise.
>   (redirect_with_delay_list_safe_p): Likewise.
>   (fill_simple_delay_slots): Likewise.
>   (fill_slots_from_thread): Likewise.
>   (make_return_insns): Likewise.
>   (dbr_schedule): Likewise.
>   (rest_of_handle_delay_slots): Likewise.
>   (pass_delay_slots::gate): Likewise.
>   * toplev.c (process_options): Likewise.

Trevor,

I am not sure if anyone else is having issues with this patch but it has
broken my MIPS build.  When building a cross compiler on x86 linux I
get:

build/genattrtab /scratch/sellcey/repos/build-bug/src/gcc/gcc/common.md 
/scratch/sellcey/repos/build-bug/src/gcc/gcc/config/mips/mips.md 
insn-conditions.md \
-Atmp-attrtab.c -Dtmp-dfatab.c -Ltmp-latencytab.c
genattrtab: Internal error: abort in write_eligible_delay, at
genattrtab.c:4524
make[1]: *** [s-attrtab] Error 1
make[1]: Leaving directory
`/scratch/sellcey/repos/build-bug/obj-mips-mti-linux-gnu/initial_gcc/gcc'
make: *** [all-gcc] Error 2
Error: Make command failed, stopping build.

Any thoughts on what I should look at?

Steve Ellcey
sell...@imgtec.com

Re: [PATCH] Fix PR rtl-optimization/67736 in combine.c

2015-10-22 Thread Steve Ellcey

On Thu, 2015-10-22 at 13:46 -0500, Segher Boessenkool wrote:

> One nit and maybe a problem:
> 
> > --- a/gcc/testsuite/gcc.dg/combine-subregs.c
> > +++ b/gcc/testsuite/gcc.dg/combine-subregs.c
> > @@ -0,0 +1,36 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -fexpensive-optimizations" } */
> 
> -fexpensive-optimizations is default at -O2.

You are right, the original report I got mentioned this flag and then I
was comparing -fexpensive-optimization and -fno-expensive-optimization
so I didn't notice that it was on by default.  I will remove it from the
test.

> > +
> > +#include 
> 
> Does every target have that header?  Shouldn't it be ?
> 
> > --- a/gcc/testsuite/gcc.dg/torture/pr67736.c
> > +++ b/gcc/testsuite/gcc.dg/torture/pr67736.c
> > @@ -0,0 +1,32 @@
> > +/* { dg-do run } */
> > +
> > +#include 
> > +
> 
> And here you don't need inttypes at all?  Confused.
> 
> 
> Segher

Not including the include in pr67736.c was an oversight, I meant to
include it.  I didn't notice that Andrew had put in his own definition
of uint64_t into this test so it didn't need any includes, that is why
it still worked.  But you are right I should be using stdint.h instead
of inttypes.h.  It looks like most tests have:

/* { dg-do run { target { stdint_types } } } */
#include 

So I will use that in both tests and remove the local definition of
uint64_t from pr67736.c.

Steve Ellcey
sell...@imgtec.com

[Patch] Update email address

2015-10-23 Thread Steve Ellcey


I am checking in this patch to update my email address.  The sell...@mips.com
address still works but I have been using sell...@imgtec.com recently as
that is more 'official'.

Steve Ellcey
sell...@imgtec.com



2015-10-23  Steve Ellcey  

* MAINTAINERS: Update email address.


diff --git a/MAINTAINERS b/MAINTAINERS
index 3db08f1..d2f7a52 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -66,7 +66,7 @@ i386 port Jan Hubicka 
 i386 port  Uros Bizjak 
 i386 vector ISA extns  Kirill Yukhin   
 ia64 port  Jim Wilson  
-ia64 port  Steve Ellcey
+ia64 port  Steve Ellcey
 iq2000 portNick Clifton
 lm32 port  Sebastien Bourdeauducq  
 m32c port  DJ Delorie  
@@ -130,7 +130,7 @@ DJGPP   DJ Delorie  

 freebsdAndreas Tobler  
 GNU/Hurd   Thomas Schwinge 
 hpux   John David Anglin   
-hpux   Steve Ellcey
+hpux   Steve Ellcey
 solarisRainer Orth 

 netbsd Jason Thorpe
 netbsd Krister Walfridsson

Re: [Patch, MIPS] Frame header optimization for MIPS (part 2)

2015-10-23 Thread Steve Ellcey

Just to follow up on this string, here is a new version of the patch
with the extraneous parenthesis removed.

Steve Ellcey
sell...@imgtec.com


2015-10-23  Steve Ellcey  

* frame-header-opt.c (gate): Check for optimize > 0.
(has_inlined_assembly): New function.
(needs_frame_header_p): Remove is_leaf_function check,
add argument type check.
(callees_functions_use_frame_header): Add is_leaf_function
and has_inlined_assembly calls..
(set_callers_may_not_allocate_frame): New function.
(frame_header_opt): Add is_leaf_function call, add
set_callers_may_not_allocate_frame call.
* config/mips/mips.c (mips_compute_frame_info): Add check
to see if callee saved regs can be put in frame header.
(mips_expand_prologue): Add check to see if step1 is zero,
fix cfa restores when using frame header to store regs.
(mips_can_use_return_insn): Check to see if registers are
stored in frame header.
* config/mips/mips.h (machine_function): Add
callers_may_not_allocate_frame and
use_frame_header_for_callee_saved_regs fields.
 
diff --git a/gcc/config/mips/frame-header-opt.c 
b/gcc/config/mips/frame-header-opt.c
index 7c7b1f2..2cf589d 100644
--- a/gcc/config/mips/frame-header-opt.c
+++ b/gcc/config/mips/frame-header-opt.c
@@ -79,7 +79,7 @@ public:
   /* This optimization has no affect if TARGET_NEWABI.   If optimize
  is not at least 1 then the data needed for the optimization is
  not available and nothing will be done anyway.  */
-  return TARGET_OLDABI && flag_frame_header_optimization;
+  return TARGET_OLDABI && flag_frame_header_optimization && optimize > 0;
 }
 
   virtual unsigned int execute (function *) { return frame_header_opt (); }
@@ -125,6 +125,29 @@ is_leaf_function (function *fn)
   return true;
 }
 
+/* Return true if this function has inline assembly code or if we cannot
+   be certain that it does not.  False if know that there is no inline
+   assembly.  */
+
+static bool
+has_inlined_assembly (function *fn)
+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+
+  /* If we do not have a cfg for this function be conservative and assume
+ it is may have inline assembly.  */
+  if (fn->cfg == NULL)
+return true;
+
+  FOR_EACH_BB_FN (bb, fn)
+for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+  if (gimple_code (gsi_stmt (gsi)) == GIMPLE_ASM)
+   return true;
+
+  return false;
+}
+
 /* Return true if this function will use the stack space allocated by its
caller or if we cannot determine for certain that it does not.  */
 
@@ -136,20 +159,26 @@ needs_frame_header_p (function *fn)
   if (fn->decl == NULL)
 return true;
 
-  if (fn->stdarg || !is_leaf_function (fn))
+  if (fn->stdarg)
 return true;
 
   for (t = DECL_ARGUMENTS (fn->decl); t; t = TREE_CHAIN (t))
 {
   if (!use_register_for_decl (t))
- return true;
+   return true;
+
+  /* Some 64 bit types may get copied to general registers using the frame
+header, see mips_output_64bit_xfer.  Checking for SImode only may be
+ overly restrictive but it is gauranteed to be safe. */
+  if (DECL_MODE (t) != SImode)
+   return true;
 }
 
   return false;
 }
 
-/* Returns TRUE if the argument stack space allocated by function FN is used.
-   Returns FALSE if the space is needed or if the need for the space cannot
+/* Return true if the argument stack space allocated by function FN is used.
+   Return false if the space is needed or if the need for the space cannot
be determined.  */
 
 static bool
@@ -177,6 +206,8 @@ callees_functions_use_frame_header (function *fn)
  called_fn = DECL_STRUCT_FUNCTION (called_fn_tree);
  if (called_fn == NULL
  || DECL_WEAK (called_fn_tree) 
+ || has_inlined_assembly (called_fn)
+ || !is_leaf_function (called_fn)
  || !called_fn->machine->does_not_use_frame_header)
return true;
}
@@ -188,6 +219,41 @@ callees_functions_use_frame_header (function *fn)
   return false;
 }
 
+/* Set the callers_may_not_allocate_frame flag for any function which
+   function FN calls because FN may not allocate a frame header.  */
+
+static void
+set_callers_may_not_allocate_frame (function *fn)
+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+  gimple *stmt;
+  tree called_fn_tree;
+  function *called_fn;
+
+  if (fn->cfg == NULL)
+return;
+
+  FOR_EACH_BB_FN (bb, fn)
+{
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+   {
+ stmt = gsi_stmt (gsi);
+ if (is_gimple_call (stmt))
+   {
+ called_fn_tree = gimple_call_fndecl (stmt);
+ if (called_fn_tree != NULL)
+   {
+

[Patch] Change to argument promotion in fixed conversion library calls

2015-11-06 Thread Steve Ellcey

This is part one of a two part patch I want to checkin in order to improve
argument passing on MIPS.  Right now MIPS defines TARGET_PROMOTE_PROTOTYPES
as true and so short and char arguments are getting promoted in the caller and
callee functions.  The ABI requires callers to do the promotion and so
I want to remove the definition of TARGET_PROMOTE_PROTOTYPES which will
get rid of the promotion done in the callee.  This matches what LLVM and
the MIPS Greenhills compilers do.

When I made this change I had one regression in the GCC testsuite
(gcc.dg/fixed-point/convert-sat.c).  I tracked this down to the 
fact that emit_library_call_value_1 does not do any argument promotion
because it does not have the original tree type information for library
calls.  It only knows about modes.  I can't change emit_library_call_value_1
to do the promotion because it does not know whether to do a signed or
unsigned promotion, but expand_fixed_convert could do the conversion
before calling emit_library_call_value_1 and that is what this patch does.

Note that if a target uses one of the default argument promotion functions
like default_promote_function_mode or
default_promote_function_mode_always_promote, this change will have no
affect because they call promote_mode and if promote_mode is called with
a NULL_TREE as the first argument it returns the original mode.  For target
specific promote functions, this change should just cause the promotion
to be done in expand_fixed_convert instead of in emit_library_call_value_1
and with the proper setting of unsignedp.

This patch by itself will not fix my MIPS bug because it uses one of the
default promote functions.  Part two of this patch will be to define the
TARGET_PROMOTE_FUNCTION_MODE macro on MIPS so that it promotes QI and HI
to SI even if the type tree is NULL and then everything will work.

The 'real' long term fix for this problem is to have tree types for builtin
functions so the proper promotions can always be done but that is a fairly
large change that I am not willing to tackle right now and it could probably
not be done in time for GCC 6.0 anyway.

See https://gcc.gnu.org/ml/gcc/2015-10/msg00234.html for more discussion
of this change.

Is this part of the patch OK to checkin?  Tested on mips-mti-linux-gnu.

Steve Ellcey
sell...@imgtec.com


2015-11-06  Steve Ellcey  

* optabs.c (expand_fixed_convert): Call promote_function_mode.


diff --git a/gcc/optabs.c b/gcc/optabs.c
index fdcdc6a..964e92a 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -4880,6 +4880,24 @@ expand_fixed_convert (rtx to, rtx from, int uintp, int 
satp)
   libfunc = convert_optab_libfunc (tab, to_mode, from_mode);
   gcc_assert (libfunc);
 
+  if (SCALAR_INT_MODE_P (from_mode))
+{
+  /*  If we need to promote the integer function argument we need to do
+ it here instead of inside emit_library_call_value because here we
+ know if we should be doing a signed or unsigned promotion.  */
+
+  machine_mode arg_mode;
+  int unsigned_p = 0;
+
+  arg_mode = promote_function_mode (NULL_TREE, from_mode,
+   &unsigned_p, NULL_TREE, 0);
+  if (arg_mode != from_mode)
+   {
+ from = convert_to_mode (arg_mode, from, uintp);
+ from_mode = arg_mode;
+   }
+}
+
   start_sequence ();
   value = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST, to_mode,
   1, from, from_mode);

Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-06 Thread Steve Ellcey

On Fri, 2015-11-06 at 20:14 +0100, Bernd Schmidt wrote:

> 
> > +  if (SCALAR_INT_MODE_P (from_mode))
> > +{
> > +  /*  If we need to promote the integer function argument we need to do
> 
> Extra space at the start of the comment.
> 
> > + it here instead of inside emit_library_call_value because here we
> > + know if we should be doing a signed or unsigned promotion.  */
> > +
> > +  machine_mode arg_mode;
> > +  int unsigned_p = 0;
> > +
> > +  arg_mode = promote_function_mode (NULL_TREE, from_mode,
> > +   &unsigned_p, NULL_TREE, 0);
> > +  if (arg_mode != from_mode)
> > +   {
> > + from = convert_to_mode (arg_mode, from, uintp);
> > + from_mode = arg_mode;
> > +   }
> > +}
> 
> Move this into a separate function (prepare_libcall_arg)? I'll think 
> about it over the weekend and let others chime in if they want, but I 
> think I'll probably end up approving it with that change.
> 
> 
> Bernd

Are you thinking of a simple function that is called on all targets or a
target specific function?  Maybe a target specific function would be
safer.

We could have:

from = targetm.prepare_libcall_arg (from, uintp);
from_mode = GET_MODE (from);

And have the default prepare_libcall_arg simply return the original
'from' rtx value.  We could also pass some other arguments, maybe the
library call rtx (fun) and an integer to specify what argument this is
(1, 2, 3, ...) if we wanted to try and generalize it even more.

Steve Ellcey
sell...@imgtec.com

Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-09 Thread Steve Ellcey

On Fri, 2015-11-06 at 20:29 +0100, Bernd Schmidt wrote:
> On 11/06/2015 08:27 PM, Steve Ellcey wrote:
> >
> > Are you thinking of a simple function that is called on all targets or a
> > target specific function?  Maybe a target specific function would be
> > safer.
> 
> No, I think just what you have there is probably sufficient.
> 
> 
> Bernd

Bernd,

Here is a version with the code moved into a new function.  How does
this look?

2015-11-09  Steve Ellcey  

* optabs.c (prepare_libcall_arg): New function.
(expand_fixed_convert): Add call to prepare_libcall_arg.


diff --git a/gcc/optabs.c b/gcc/optabs.c
index fdcdc6a..fb25f90 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -4838,6 +4838,33 @@ expand_fix (rtx to, rtx from, int unsignedp)
 }
 }
 
+
+/* Promote integer arguments for a libcall if necessary.
+   emit_library_call_value cannot do the promotion because it does not
+   know if it should do a signed or unsigned promotion.  This is because
+   there are no tree types defined for libcalls.  */
+
+static rtx
+prepare_libcall_arg (rtx arg, int uintp)
+{
+  machine_mode mode = GET_MODE (arg);
+  machine_mode arg_mode;
+  if (SCALAR_INT_MODE_P (mode))
+{
+  /*  If we need to promote the integer function argument we need to do
+ it here instead of inside emit_library_call_value because in
+ emit_library_call_value we don't know if we should do a signed or
+ unsigned promotion.  */
+
+  int unsigned_p = 0;
+  arg_mode = promote_function_mode (NULL_TREE, mode,
+   &unsigned_p, NULL_TREE, 0);
+  if (arg_mode != mode)
+   return convert_to_mode (arg_mode, arg, uintp);
+}
+return arg;
+}
+
 /* Generate code to convert FROM or TO a fixed-point.
If UINTP is true, either TO or FROM is an unsigned integer.
If SATP is true, we need to saturate the result.  */
@@ -4880,6 +4907,9 @@ expand_fixed_convert (rtx to, rtx from, int uintp, int 
satp)
   libfunc = convert_optab_libfunc (tab, to_mode, from_mode);
   gcc_assert (libfunc);
 
+  from = prepare_libcall_arg (from, uintp);
+  from_mode = GET_MODE (from);
+
   start_sequence ();
   value = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST, to_mode,
   1, from, from_mode);

Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-09 Thread Steve Ellcey

On Mon, 2015-11-09 at 21:47 +0100, Bernd Schmidt wrote:
> On 11/09/2015 05:59 PM, Steve Ellcey wrote:
> > Here is a version with the code moved into a new function.  How does
> > this look?
> >
> > 2015-11-09  Steve Ellcey  
> >
> > * optabs.c (prepare_libcall_arg): New function.
> > (expand_fixed_convert): Add call to prepare_libcall_arg.
> 
> Hold on a moment - I see that emit_library_call_value_1 calls 
> promote_function_mode for arguments. Can you investigate why that 
> doesn't do what you need?
> 
> 
> Bernd

emit_library_call_value_1 has no way of knowing if the promotion should
be signed or unsigned because it has a mode (probably QImode or HImode)
that it knows may need to be promoted to SImode but it has no way to
know if that should be a signed or unsigned promotion because it has no
tree type information about the library call argument types.

Right now it guesses based on the return type but it may guess wrong
when converting an unsigned int to a signed fixed type or visa versa.

By doing the promotion in expand_fixed_convert GCC can use the uintp
argument to ensure that the signedness of the promotion is done
correctly.  We could pass that argument into emit_library_call_value_1
so it can do the correct promotion but that would require changing the
argument list for emit_library_call and emit_library_call_value_1 and
changing all the other call locations for those functions and that
seemed like overkill.

Steve Ellcey

[Patch, MIPS] Remove definition of TARGET_PROMOTE_PROTOTYPES

2015-11-10 Thread Steve Ellcey

This patch removes the definition of TARGET_PROMOTE_PROTOTYPES from MIPS,
where it was defined as true, so that it now defaults to false.

Currently MIPS does prototype promotion in the caller and the callee and this
patch removes the TARGET_PROMOTE_PROTOTYPES macro definition so that
the promotion is only done in the caller (due to PROMOTE_MODE being defined).
This does not break the ABI which requires the caller to do promotions anyway.
(See https://gcc.gnu.org/ml/gcc/2015-10/msg00223.html).  This change also
causes GCC to match what the LLVM and Greenhills compilers already do on MIPS.

After removing this macro I had three regressions, two were just tests that
needed changing but one was a bug (gcc.dg/fixed-point/convert-sat.c).
This test was calling a library function to convert a signed char into an
unsigned fixed type and because we don't have tree type information about
libcalls GCC cannot do the ABI required type promotion on those calls that it
does on normal user defined calls.  In fact promote_mode in explow.c expicitly
returns without doing anything if no type is given it.  Before this change it
didn't matter on MIPS because the callee did the same promotion that the caller
was supposed to have done before using the argument.  Now that callee code is
gone we depend on the caller doing the correct promotion and that was not
happening.

I submitted and checked in another patch to optabs.c
(See https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00704.html) to provide
me with the infrastructure to do the correct type promotion in expand_fixed
and this patch redefines TARGET_PROMOTE_FUNCTION_MODE to return the needed
promotion mode even when type is NULL_TREE.  When type is set it does
the same thing as it used to do.  This change allows me to remote the
definition of TARGET_PROMOTE_PROTOTYPES without the convert-sat.c test
failing.

The two tests that I changed are gcc.dg/tree-ssa/ssa-fre-4.c and
gcc.target/mips/ext-2.c.  ssa-fre-4.c no longer applies to MIPS now
that we do not define TARGET_PROMOTE_PROTOTYPES so I removed the MIPS
target from it.  ext-2.c now generates an srl instruction instead of a
dext instruction but the number of instructions has not changed and I
updated the scan checks.

Tested on mips-mti-linux-gnu with no unfixed regressions.  OK to checkin?

Steve Ellcey
sell...@imgtec.com


2015-11-10  Steve Ellcey  

* config/mips/mips.c (mips_promote_function_mode): New function.
(TARGET_PROMOTE_FUNCTION_MODE): Define as above function.
(TARGET_PROMOTE_PROTOTYPES): Remove.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 9880b23..e9c3830 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -19760,6 +19760,32 @@ mips_ira_change_pseudo_allocno_class (int regno, 
reg_class_t allocno_class)
 return GR_REGS;
   return allocno_class;
 }
+
+/* Implement TARGET_PROMOTE_FUNCTION_MODE */
+
+/* This function is equivalent to default_promote_function_mode_always_promote
+   except that it returns a promoted mode even if type is NULL_TREE.  This is
+   needed by libcalls which have no type (only a mode) such as fixed conversion
+   routines that take a signed or unsigned char/short argument and convert it
+   to a fixed type.  */
+
+static machine_mode
+mips_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
+machine_mode mode,
+int *punsignedp ATTRIBUTE_UNUSED,
+const_tree fntype ATTRIBUTE_UNUSED,
+int for_return ATTRIBUTE_UNUSED)
+{
+  int unsignedp;
+
+  if (type != NULL_TREE)
+return promote_mode (type, mode, punsignedp);
+
+  unsignedp = *punsignedp;
+  PROMOTE_MODE (mode, unsignedp, type);
+  *punsignedp = unsignedp;
+  return mode;
+}
 
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
@@ -19864,10 +19890,7 @@ mips_ira_change_pseudo_allocno_class (int regno, 
reg_class_t allocno_class)
 #define TARGET_GIMPLIFY_VA_ARG_EXPR mips_gimplify_va_arg_expr
 
 #undef  TARGET_PROMOTE_FUNCTION_MODE
-#define TARGET_PROMOTE_FUNCTION_MODE 
default_promote_function_mode_always_promote
-#undef TARGET_PROMOTE_PROTOTYPES
-#define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_true
-
+#define TARGET_PROMOTE_FUNCTION_MODE mips_promote_function_mode
 #undef TARGET_FUNCTION_VALUE
 #define TARGET_FUNCTION_VALUE mips_function_value
 #undef TARGET_LIBCALL_VALUE




2015-11-10  Steve Ellcey  

* gcc.dg/tree-ssa/ssa-fre-4.c: Remove mips*-*-* target.
* gcc.target/mips/ext-2.c: Update scan checks.


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c
index 02b6719..5a7588f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c
@@ -1,6 +1,6 @@
 /* If the target returns false for TARGET_PROMOTE_PROTOTYPES, then there
will be no casts for FRE to eliminate and the test will fail.  */
-/* { dg-

[Patch] Fix bug for frame instructions in annulled delay slots

2015-12-07 Thread Steve Ellcey

While doing more testing of the MIPS frame header optimization I checked
in earlier, I found I could not build the elf toolchain when I turned the
optimization on by default.  This is because GCC was putting a frame related
instruction (the save of the return address) into an annulled delay slot.
This only happens when the function prologue is moved after a branch by the
shrink wrap optimization and it caused create_cfi_notes to abort because the
code that generates the CFI information cannot handle stack related annulled
instructions.

This never occurred when I did my testing with the linux toolchain because
I did not use -Os (which caused the use of instructions with annulled delay
slots) and because -O2 does more inlining than -Os which meant there were
fewer places where the return address could be saved in the callers frame
header without the callee having to allocate its own stack space.

The fix for this is not in mips specific code but in reorg.c where we
restrict GCC from putting any frame related instructions into annulled
delay slots.  There was already code in reorg.c to prevent frame related
instructions from going into normal delay slots but that code did not seem
to cover all the unnulled delay slot situations.  This patch extends the
code to not put frame related instructions into annulled delay slots.

Tested on MIPS with linux and elf toolchain builds and testsuite runs.

OK to checkin?

Steve Ellcey
sell...@imgtec.com


2015-12-07  Steve Ellcey  

* reorg.c (optimize_skip): Do not put frame related instructions
in annulled delay slots.
(steal_delay_list_from_target): Ditto.
(fill_slots_from_thread): Ditto.


diff --git a/gcc/reorg.c b/gcc/reorg.c
index cc68d6b..bab5f49 100644
--- a/gcc/reorg.c
+++ b/gcc/reorg.c
@@ -739,6 +739,7 @@ optimize_skip (rtx_jump_insn *insn, vec 
*delay_list)
   || recog_memoized (trial) < 0
   || (! eligible_for_annul_false (insn, 0, trial, flags)
  && ! eligible_for_annul_true (insn, 0, trial, flags))
+  || RTX_FRAME_RELATED_P (trial)
   || can_throw_internal (trial))
 return;
 
@@ -1126,7 +1127,13 @@ steal_delay_list_from_target (rtx_insn *insn, rtx 
condition, rtx_sequence *seq,
  trial, flags)))
{
  if (must_annul)
-   used_annul = 1;
+   {
+ /* Frame related instructions cannot go into annulled delay
+slots, it messes up the dwarf info.  */
+ if (RTX_FRAME_RELATED_P (trial))
+   return;
+ used_annul = 1;
+   }
  rtx_insn *temp = copy_delay_slot_insn (trial);
  INSN_FROM_TARGET_P (temp) = 1;
  add_to_delay_list (temp, &new_delay_list);
@@ -2464,9 +2471,9 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx 
condition,
  if (eligible_for_delay (insn, *pslots_filled, trial, flags))
goto winner;
}
- else if (0
-  || (ANNUL_IFTRUE_SLOTS && ! thread_if_true)
-  || (ANNUL_IFFALSE_SLOTS && thread_if_true))
+ else if (!RTX_FRAME_RELATED_P (trial) \
+  && ((ANNUL_IFTRUE_SLOTS && ! thread_if_true)
+   || (ANNUL_IFFALSE_SLOTS && thread_if_true)))
{
  old_trial = trial;
  trial = try_split (pat, trial, 0);



2015-12-07  Steve Ellcey  

* gcc.target/mips/wrap-delay.c: New test.


diff --git a/gcc/testsuite/gcc.target/mips/wrap-delay.c 
b/gcc/testsuite/gcc.target/mips/wrap-delay.c
index e69de29..1332a68 100644
--- a/gcc/testsuite/gcc.target/mips/wrap-delay.c
+++ b/gcc/testsuite/gcc.target/mips/wrap-delay.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-g -mframe-header-opt -mbranch-likely" } */
+
+/* GCC was generating an ICE with the above options and -Os because
+   it was putting the save of $31 in two annulled delay slots but
+   there was only one restore since only one of the two saves could be
+   executed.  This was correct code but it confused dwarf2cfi and
+   caused it to ICE when using the -g option.  */
+
+
+enum demangle_component_type
+{
+   DEMANGLE_COMPONENT_TRINARY_ARG2,
+};
+struct demangle_component
+{
+  enum demangle_component_type type;
+};
+struct d_info
+{
+  int next_comp;
+  int num_comps;
+};
+struct demangle_component * d_make_empty (struct d_info *di)
+{
+  if (di->next_comp >= di->num_comps) return ((void *)0);
+  ++di->next_comp;
+}
+struct demangle_component *d_make_comp (
+   struct d_info *di,
+   enum demangle_component_type type,
+   struct demangle_component *left)
+{
+  struct demangle_component *p;
+  switch (type)
+{
+case DEMANGLE_COMPONENT_TRINARY_ARG2:
+  if (left == ((void *)0)) return ((void *)0);
+}
+  p = d_make_empty (di);
+  p->type = type;
+}

Re: [Patch] Fix bug for frame instructions in annulled delay slots

2015-12-07 Thread Steve Ellcey

On Mon, 2015-12-07 at 20:28 +0100, Bernd Schmidt wrote:
> On 12/07/2015 07:54 PM, Steve Ellcey  wrote:
> >   if (must_annul)
> > -   used_annul = 1;
> > +   {
> > + /* Frame related instructions cannot go into annulled delay
> > +slots, it messes up the dwarf info.  */
> > + if (RTX_FRAME_RELATED_P (trial))
> > +   return;
> 
> Don't you need to use break rather than return?

I am not sure about this.  There is an earlier if statement in the loop
that does a 'return' instead of a break (or continue) and there is a 
return in the 'else' part of the if that sets must_annul.  Both of these
are inside the loop that looks at all the instructions in the sequence
'seq'.  I think the code is looking at all the instructions in the
sequence and if any of them fail one of the tests in the loop (in this
case require annulling) then we can't handle any of the instructions in
the sequence and so we return immediately without putting any of the
instructions from 'seq' in the delay slot.  I believe a frame related
instruction in an annulled branch needs to be handled that way too.

> 
> > + else if (!RTX_FRAME_RELATED_P (trial) \
> 
> Stray backslash.

That is easily fixed.

> Other than that I think this is OK. There are some preexisting tests for 
> frame related insns already in this code.
> 
> 
> Bernd
>

Re: [Patch] Fix bug for frame instructions in annulled delay slots

2015-12-07 Thread Steve Ellcey

On Mon, 2015-12-07 at 20:59 +0100, Bernd Schmidt wrote:
> On 12/07/2015 08:43 PM, Steve Ellcey wrote:
> > I am not sure about this.  There is an earlier if statement in the loop
> > that does a 'return' instead of a break (or continue) and there is a
> > return in the 'else' part of the if that sets must_annul.  Both of these
> > are inside the loop that looks at all the instructions in the sequence
> > 'seq'.  I think the code is looking at all the instructions in the
> > sequence and if any of them fail one of the tests in the loop (in this
> > case require annulling) then we can't handle any of the instructions in
> > the sequence and so we return immediately without putting any of the
> > instructions from 'seq' in the delay slot.  I believe a frame related
> > instruction in an annulled branch needs to be handled that way too.
> 
> Ah, I think I was looking at the other function that has the same 
> must_annul test (steal_delay_list_from_fallthrough). The patch is ok 
> without the backslash. Maybe the other function ought to be changed as 
> well though?
> 
> 
> Bernd

That would seem reasonable, though I don't have a test case for where
the change to that routine would actually make a difference in the
compilation.  I'll create a patch and test it to make sure it doesn't
cause any problems and then submit it as a follow-up to this change.

Steve Ellcey
sell...@imgtec.com

Re: [Patch] Fix bug for frame instructions in annulled delay slots

2015-12-07 Thread Steve Ellcey

On Mon, 2015-12-07 at 12:30 -0700, Jeff Law wrote:
> On 12/07/2015 12:28 PM, Bernd Schmidt wrote:
> > On 12/07/2015 07:54 PM, Steve Ellcey  wrote:
> >> if (must_annul)
> >> -used_annul = 1;
> >> +{
> >> +  /* Frame related instructions cannot go into annulled delay
> >> + slots, it messes up the dwarf info.  */
> >> +  if (RTX_FRAME_RELATED_P (trial))
> >> +return;
> >
> > Don't you need to use break rather than return?
> >
> >> +  else if (!RTX_FRAME_RELATED_P (trial) \
> >
> > Stray backslash
> > Other than that I think this is OK. There are some preexisting tests for
> > frame related insns already in this code.
> Also note there's probably port cleanup that could happen once this goes 
> in.  IIRC the PA port (for example) explicitly disallows frame related 
> insns from many (most, all?) delay slots.  Other targets may be doing 
> something similar.
> 
> jeff

If I had remembered/seen that code in pa.md earlier I might have fixed
MIPS the same way.  Oh well, I guess changing reorg.c is the better way
to do it since it can be shared now.

It looks like steal_delay_list_from_fallthrough is missing a lot of
FRAME_RELATED checks (and other things) that
steal_delay_list_from_target has in it (not just my changes).  I think
that means most of the work/fixing has been on machines like MIPS where
the code in the delay slot is either always executed or only
executed when the branch is taken (vs. untaken).  I think
steal_delay_list_from_fallthrough is only going to be used when the
delay slot is executed on a branch-not-taken.  Are there any machines
like that?

Steve Ellcey
sell...@imgtec.com

[RFC] Request for comments on ivopts patch

2015-12-08 Thread Steve Ellcey

I have an ivopts optimization question/proposal.  When compiling the
attached program the ivopts pass prefers the original ivs over new ivs
and that causes us to generate less efficient code on MIPS.  It may
affect other platforms too.

The Source code is a C strcmp:

int strcmp (const char *p1, const char *p2)
{
  const unsigned char *s1 = (const unsigned char *) p1;
  const unsigned char *s2 = (const unsigned char *) p2;
  unsigned char c1, c2;
  do {
  c1 = (unsigned char) *s1++;
  c2 = (unsigned char) *s2++;
  if (c1 == '\0') return c1 - c2;
  } while (c1 == c2);
  return c1 - c2;
}

Currently the code prefers the original ivs and so it generates
code that increments s1 and s2 before doing the loads (and uses
a -1 offset):

  :
  # s1_1 = PHI 
  # s2_2 = PHI 
  s1_6 = s1_1 + 1;
  c1_8 = MEM[base: s1_6, offset: 4294967295B];
  s2_9 = s2_2 + 1;
  c2_10 = MEM[base: s2_9, offset: 4294967295B];
  if (c1_8 == 0)
goto ;
  else
goto ;

If I remove the cost increment for non-original ivs then GCC
does the loads before the increments:

 :
  # ivtmp.6_17 = PHI 
  # ivtmp.7_21 = PHI 
  _25 = (void *) ivtmp.6_17;
  c1_8 = MEM[base: _25, offset: 0B];
  _26 = (void *) ivtmp.7_21;
  c2_10 = MEM[base: _26, offset: 0B];
  if (c1_8 == 0)
goto ;
  else
goto ;
.
.
  :
  ivtmp.6_14 = ivtmp.6_17 + 1;
  ivtmp.7_23 = ivtmp.7_21 + 1;
  if (c1_8 == c2_10)
goto ;
  else
goto ;


This second case (without the preference for the original IV)
generates better code on MIPS because the final assembly 
has the increment instructions between the loads and the tests
of the values being loaded and so there is no delay (or less delay)
between the load and use.  It seems like this could easily be 
the case for other platforms too so I was wondering what people
thought of this patch:


2015-12-08  Steve Ellcey  

* tree-ssa-loop-ivopts.c (determine_iv_cost): Remove preference for
original ivs.


diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 98dc451..26daabc 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5818,14 +5818,6 @@ determine_iv_cost (struct ivopts_data *data, struct 
iv_cand *cand)
 
   cost = cost_step + adjust_setup_cost (data, cost_base.cost);
 
-  /* Prefer the original ivs unless we may gain something by replacing it.
- The reason is to make debugging simpler; so this is not relevant for
- artificial ivs created by other optimization passes.  */
-  if (cand->pos != IP_ORIGINAL
-  || !SSA_NAME_VAR (cand->var_before)
-  || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before)))
-cost++;
-
   /* Prefer not to insert statements into latch unless there are some
  already (so that we do not create unnecessary jumps).  */
   if (cand->pos == IP_END

[Patch] Fix for MIPS PR target/65604

2015-12-09 Thread Steve Ellcey

This is a MIPS patch to make mips_output_division obey the
-fno-delayed-branch flag.  Right now, with mips1 and -mcheck-zero-division,
the division instruction is put into the bne delay slot even when
-fno-delayed-branch is specified.  This change uses a similar strategy
to MIPS16 where we do the division first and then do the zero test
while the division is being calculated.  Tested with mips1 runs and by
inspecting the code that is output.

OK to checkin?

Steve Ellcey
sell...@imgtec.com


2015-12-09  Steve Ellcey  

PR target/65604
* config/mips/mips.c (mips_output_division): Check flag_delayed_branch.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 6145944..8444a91 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -13687,9 +13687,17 @@ mips_output_division (const char *division, rtx 
*operands)
}
   else
{
- output_asm_insn ("%(bne\t%2,%.,1f", operands);
- output_asm_insn (s, operands);
- s = "break\t7%)\n1:";
+ if (flag_delayed_branch)
+   {
+ output_asm_insn ("%(bne\t%2,%.,1f", operands);
+ output_asm_insn (s, operands);
+ s = "break\t7%)\n1:";
+   }
+ else
+   {
+ output_asm_insn (s, operands);
+ s = "bne\t%2,%.,1f\n\tnop\n\tbreak\t7\n1:";
+   }
}
 }
   return s;

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-10-08 Thread Steve Ellcey

On Mon, 2019-10-07 at 12:04 +0200, Richard Biener wrote:
> On Tue, Oct 1, 2019 at 1:48 PM Dmitrij Pochepko
>  wrote:
> > 
> > Hi Richard,
> > 
> > I updated patch according to all your comments.
> > Also bootstrapped and tested again on x86_64-pc-linux-gnu and
> > aarch64-linux-gnu, which took some time.
> > 
> > attached v3.
> 
> OK.
> 
> Thanks,
> Richard.

Dmitrij,

I checked in this patch for you.

Steve Ellcey
sell...@marvell.com

Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-06 Thread Steve Ellcey

Thanks for the feedback Kyrill.  I have updated my patch and attached
the new version to this email.  The one change I did not make was to
remove load_pair_dw_tftf and store_pair_dw_tftf and use the
load_pair and vec_store_pair
patterns.  Having to add two new iterators to remove two instructions
didn't seem like much of an advantage and I liked having the names
for tf to match the other load_pair_dw/store_pair_dw instructions.
If you feel strongly about this I can go ahead and make that change
though.

With respect to the new tests iterating over the various options, I am
not sure how that works but it does.  I copied the aarch64-torture.exp
file from one of the other targets and verified that it ran the
tests with -O0, -O1, -O2, '-O3 -g', -Os, '-O2 -flto -fno-use-linker-
plugin -flto-partition=none', and '-O2 -flto -fuse-linker-plugin -fno-
fat-lto-objects'.

Steve Ellcey
sell...@cavium.com


2018-08-06  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.


2018-08-06  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index af5db9c..99c962f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
@@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx);
 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);
 
+extern int aarch64_epilogue_uses (int);
+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
    rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..7ab2a60 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1027,6 +1027,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1405,6 +1414,31 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_decl_p (tree fndecl)
+{
+  if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fnd

Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-07 Thread Steve Ellcey

I have a question about my own patch.  In doing testing I realized
that if I use the aarch64_vector_pcs attribute on a platform without
SIMD (i.e. -march=armv8.1-a+nosimd) I get an ICE.  That is obviously
not what we want but I was wondering what the right behaviour is.

We certainly don't want to generate code for a SIMD function on a
target that does not support SIMD, but is it OK to declare something
with the simd or aarch64_vector_pcs attribute if it is never used or called?
Or should even the declaration of a function with the simd/aarch64_vector_pcs
attribute result in a compilation error?  I don't think we want to
complain about simd declarations because those could show up in the system
headers like mathcalls.h even when not used.  Or should those be ifdef'ed
in some way based on the __ARM_NEON macro so they don't show up when not
compiling for SIMD?

Any ideas on where should this check be done, I thought the
TARGET_OPTION_VALID_ATTRIBUTE_P hook might be the right place, but that
seems to be specific to the 'target' attribute only, not attributes
in general.

Steve Ellcey
sell...@cavium.com

Re: [aarch64}: added variable issue rate feature for falkor

2018-08-15 Thread Steve Ellcey

On Wed, 2018-08-15 at 12:45 +0200, Kai Tietz wrote:
> 
> > I believe the scheduler provides hooks specifically for storing
> > backend-specific scheduling state so we should
> > avoid creating such static variables in aarch64.c. Can you use the
> > TARGET_SCHED_*_SCHED_CONTEXT family of hooks here?
> > Then it will be up to the scheduler midend to keep track of the
> > state and
> > between basic blocks, functions etc.
> I think you refer to the ppc implementation. But if you take a closer
> look to it, you will see that nevertheless such an implementation will
> require global variables.
> So I am not really sure it is worth to introduce the ..._SCHED_CONTEXT
> API to avoid one global variable by introducing at least two others.
> Neverthelss I am admit that making use of SCHED_CONTEXT could be a
> general nice to have, but not necessarily a gain in that case.

I think having the SCHED_CONTEXT infrastructure could be helpful to me
for Thunderx2.  I am looking at an issue that I
think TARGET_SCHED_VARIABLE_ISSUE will help with so having the general
API available would be useful.

I am looking at the lbm benchmark from SPEC and I believe it is
overscheduling floating point instructions, if I cut the issue rate
from 4 to 2, I get better performance on this one benchmark, but not
on others.

According to the Thunderx2 description, we can dispatch up to 4 uops
per clock, but only up to 2 FP/SIMD uops.  I know dispatch is different
than issue but I don't think that GCC models both.  For my patch
I would probably want to save the previous instruction scheduled so
that if it and the current one are both FP/SIMD ops, then that is all
we can issue.  I might need to save several instructions,  not just the
last one, to get everything correct.

Steve Ellcey
sell...@cavium.com

Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-20 Thread Steve Ellcey

On Tue, 2018-08-07 at 12:15 -0500, Segher Boessenkool wrote:

> > +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[01234567\]" } }
> > */
> That's [0-7] but maybe you find [01234567] more readable here.

Segher,  I fixed all the issues you pointed out except this one.  Since
there are some uses of non consecutive numbers in one of the tests I 
decided to leave [01234567] instead of using [0-7].  Here is the 
latest version of the patch, there are no semantic changes, just
syntactical ones to address the issues that you pointed out.

Steve Ellcey
sell...@cavium.com

2018-08-20  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Check for simd function.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.

2018-08-20  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index af5db9c..99c962f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);

@@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx);
 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);

+extern int aarch64_epilogue_uses (int);
+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
    rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..9cffec8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1027,6 +1027,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;

+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)

 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1405,6 +1414,31 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }

+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_decl_p (tree fndecl)
+{
+  if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL)
+return true;
+  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL)
+return false;
+  return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl;
+}
+
+/* Return the mode a register save/restore should use.  DImode for integer
+   registers, DFmode for FP registers in non-SIMD func

ToT build problem on Aarch64 after cfg.h change

2018-08-24 Thread Steve Ellcey

Richard,

I am having a problem building GCC after this patch:

commit 2515797e5db67076d6cf7f3f185757c841f79edf
Author: rguenth 
Date:   Fri Aug 24 11:17:16 2018 +

2018-08-24  Richard Biener  
   
* cfg.h (struct control_flow_graph): Add edge_flags_allocated and
bb_flags_allocated members.
(auto_flag): New RAII class for allocating flags.
(auto_edge_flag): New RAII class for allocating edge flags.
(auto_bb_flag): New RAII class for allocating bb flags.
* cfgloop.c (verify_loop_structure): Allocate temporary edge
flag dynamically.
* cfganal.c (dfs_enumerate_from): Remove use of visited sbitmap
in favor of temporarily allocated BB flag.
* hsa-brig.c: Re-order includes.
* hsa-dump.c: Likewise.
* hsa-regalloc.c: Likewise.
* print-rtl.c: Likewise.
* profile-count.c: Likewise.

The failure is:

In file included from 
/home/sellcey/gcc-m/src/gcc/gcc/config/aarch64/aarch64-speculation.cc:28:0:
/home/sellcey/gcc-m/src/gcc/gcc/cfg.h: In constructor 
‘auto_edge_flag::auto_edge_flag(function*)’:
/home/sellcey/gcc-m/src/gcc/gcc/cfg.h:172:22: error: invalid use of incomplete 
type ‘struct function’
 : auto_flag (&fun->cfg->edge_flags_allocated) {}

It looks like this may be Aarch64 specific build problem since it is 
compiling a platform specific file.  Is there just a missing include?

Steve Ellcey
sell...@cavium.com

Re: Fix MIPS builds

2018-08-27 Thread Steve Ellcey

On Sun, 2018-08-26 at 20:56 -0600, Jeff Law wrote:
> MIPS builds have been failing to build due to trying to use an
> incomplete struct function type.  The uses are coming from cfg.h which
> typically isn't included in the target files.
> 
> Fixed by including "backend.h" and removing the "cfg.h" inclusion.
> 
> THere's likely other targets (aarch64?) that are suffering from a
> similar problem and could likely be fixed in a similar way.  I'll look
> at those once I'm confident all my MIPS targets are building again.
> 
> jeff

I tried this change in gcc/config/aarch64/aarch64-speculation.cc
(changing include of cfg.h to backend.h) and it fixed my build problem.

Steve Ellcey

Re: Fix MIPS builds

2018-08-27 Thread Steve Ellcey

On Mon, 2018-08-27 at 10:27 -0600, Jeff Law wrote:
> External Email
> 
> On 08/27/2018 09:55 AM, Steve Ellcey wrote:
> > 
> > On Sun, 2018-08-26 at 20:56 -0600, Jeff Law wrote:
> > > 
> > > MIPS builds have been failing to build due to trying to use an
> > > incomplete struct function type.  The uses are coming from cfg.h which
> > > typically isn't included in the target files.
> > > 
> > > Fixed by including "backend.h" and removing the "cfg.h"
> > > inclusion.
> > > 
> > > THere's likely other targets (aarch64?) that are suffering from a
> > > similar problem and could likely be fixed in a similar way.  I'll look
> > > at those once I'm confident all my MIPS targets are building
> > > again.
> > > 
> > > jeff
> > I tried this change in gcc/config/aarch64/aarch64-speculation.cc
> > (changing include of cfg.h to backend.h) and it fixed my build problem.

> Go ahead and commit it :-)  backend.h seems like an appropriate thing to
> include there.
> 
> jeff

OK,  here is the patch I will check in:

2018-08-27  Steve Ellcey  

* config/aarch64/aarch64-speculation.cc: Replace include of cfg.h
with include of backend.h.


diff --git a/gcc/config/aarch64/aarch64-speculation.cc 
b/gcc/config/aarch64/aarch64-speculation.cc
index 2dd06ae..3cd9ba0 100644
--- a/gcc/config/aarch64/aarch64-speculation.cc
+++ b/gcc/config/aarch64/aarch64-speculation.cc
@@ -25,7 +25,7 @@
 #include "rtl.h"
 #include "tree-pass.h"
 #include "profile-count.h"
-#include "cfg.h"
+#include "backend.h"
 #include "cfgbuild.h"
 #include "print-rtl.h"
 #include "cfgrtl.h"

Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-31 Thread Steve Ellcey



Ping.  Any feedback from the Aarch64 maintainers?

Steve Ellcey
sell...@cavium.com



On Mon, 2018-08-20 at 10:37 -0700, Steve Ellcey wrote:
> On Tue, 2018-08-07 at 12:15 -0500, Segher Boessenkool wrote:
> 
> > 
> > > 
> > > +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[01234567\]" }
> > > }
> > > */
> > That's [0-7] but maybe you find [01234567] more readable here.
> Segher,  I fixed all the issues you pointed out except this one.
>  Since
> there are some uses of non consecutive numbers in one of the tests I 
> decided to leave [01234567] instead of using [0-7].  Here is the 
> latest version of the patch, there are no semantic changes, just
> syntactical ones to address the issues that you pointed out.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2018-08-20  Steve Ellcey  
> 
>   * config/aarch64/aarch64-protos.h
> (aarch64_use_simple_return_insn_p):
>   New prototype.
>   (aarch64_epilogue_uses): Ditto.
>   * config/aarch64/aarch64.c (aarch64_attribute_table): New
> array.
>   (aarch64_simd_decl_p): New function.
>   (aarch64_reg_save_mode): New function.
>   (aarch64_is_simd_call_p): New function.
>   (aarch64_function_ok_for_sibcall): Check for simd calls.
>   (aarch64_layout_frame): Check for simd function.
>   (aarch64_gen_storewb_pair): Handle E_TFmode.
>   (aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
>   (aarch64_gen_loadwb_pair): Handle E_TFmode.
>   (aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
>   (aarch64_gen_store_pair): Handle E_TFmode.
>   (aarch64_gen_load_pair): Ditto.
>   (aarch64_save_callee_saves): Handle different mode sizes.
>   (aarch64_restore_callee_saves): Ditto.
>   (aarch64_components_for_bb): Check for simd function.
>   (aarch64_epilogue_uses): New function.
>   (aarch64_process_components): Check for simd function.
>   (aarch64_expand_prologue): Ditto.
>   (aarch64_expand_epilogue): Ditto.
>   (aarch64_expand_call): Ditto.
>   (TARGET_ATTRIBUTE_TABLE): New define.
>   * config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
>   (FP_SIMD_SAVED_REGNUM_P): New macro.
>   * config/aarch64/aarch64.md (V23_REGNUM) New constant.
>   (simple_return): New define_expand.
>   (load_pair_dw_tftf): New instruction.
>   (store_pair_dw_tftf): Ditto.
>   (loadwb_pair_): Ditto.
>   ("storewb_pair_): Ditto.
> 
> 2018-08-20  Steve Ellcey  
> 
>   * gcc.target/aarch64/torture/aarch64-torture.exp: New file.
>   * gcc.target/aarch64/torture/simd-abi-1.c: New test.
>   * gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
>   * gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
>   * gcc.target/aarch64/torture/simd-abi-4.c: Ditto.

Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-09-04 Thread Steve Ellcey

On Tue, 2018-09-04 at 17:20 +, Wilco Dijkstra wrote:
> External Email
> 
> Hi Steve,
> 
> The latest version compiles the examples I used correctly, so it looks fine
> from that perspective (but see comments below). However the key point of
> the ABI is to enable better code generation when calling a vector function,
> and that will likely require further changes that may conflict with this 
> patch.
> Do you have patches for that work outstanding? It seems best to do this in
> one go.
> 
> Also did you check there is no regression in code generation for non-vector
> functions?

Yes, I checked for regressions in non-vector functions.  They way this
is written, there should be zero changes to any generated code that
does not involve vector functions.

I have some changes to improve code generation for SIMD functions but
it is not ready to go yet.  The main change I made to improve code
generation is to modify TARGET_HARD_REGNO_CALL_PART_CLOBBERED to accept
an instruction pointer so we could check to see if the call being made
is to a 'normal' function or a 'simd' function and return the correct
value.  Unfortunately, there are many places where we don't have the
call insn available and have to make the pessimistic guess.

I also changed get_call_reg_set_usage so that it could return different
default sets of what registers changed based on whether or not the
function being called was normal or simd.  Again, there are some places
where we have to make a pessimistic guess.

> 
> 
> +/* Return 1 if the register is used by the epilogue.  We need to say the
> +   return register is used, but only after epilogue generation is complete.
> +   Note that in the case of sibcalls, the values "used by the epilogue" are
> +   considered live at the start of the called function.
> +
> +   For SIMD functions we need to return 1 for FP registers that are saved and
> +   restored by a function but not zero in call_used_regs.  If we do not do
> +   this optimizations may remove the restore of the register.  */
> +
> +int
> +aarch64_epilogue_uses (int regno)
> +{
> +  if (epilogue_completed && (regno) == LR_REGNUM)
> +return 1;
> +  if (aarch64_simd_decl_p (cfun->decl) && FP_SIMD_SAVED_REGNUM_P
> (regno))
> +return 1;
> +  return 0;
> +}
> 
> I'm not convinced this is a good idea. It suggests GCC doesn't have the 
> correct set
> of caller/callee-save registers for vector functions (I don't see a change to 
> update
> CALL_USED_REGISTERS or aarch64_hard_regno_call_part_clobbered), which could
> lead to all kinds of interesting issues.

The problem is that there is no single correct set of caller/callee-
saved registers anymore.  There is one set for normal functions and a
different set for simd functions.  GCC isn't set up to have two
different caller/callee saved ABIs on one target, there is only
one CALL_USED_REGISTERS macro used to set one call_used_regs
array.  Should V16-V23 be set in CALL_USED_REGISTERS?  For 'normal'
functions, yes.  For SIMD functions, no.

> 
> +/* Return false for non-leaf SIMD functions in order to avoid
> +   shrink-wrapping them.  Doing this will lose the necessary
> +   save/restore of FP registers.  */
> +
> +bool
> +aarch64_use_simple_return_insn_p (void)
> +{
> +  if (aarch64_simd_decl_p (cfun->decl) && !crtl->is_leaf)
> +return false;
> +
> +  return true;
> +}
> 
> Such as this...
> 
> Wilco

[PATCH][aarch64] Enable ifunc resolver attribute by default

2017-06-12 Thread Steve Ellcey

I recently noticed that the GCC 'resolver' attribute used for ifunc's is not
on by default for aarch64 even though all the infrastructure to support it is
in place.  I made memcpy an ifunc on aarch64 in glibc and am looking at
possibly using it for libatomic too.  For this reason I would like to enable
it by default.  Note that the memcpy ifunc works even when this is not enabled
because glibc enables ifuncs by using the assembly language .type psuedo-op to
set the resolver attribute when GCC cannot do it with an attribute.  Using
an ifunc in libatomic does require this to be enabled and I do not see any
reason not to have it enabled by default.

Tested with no regressions, OK to check in?

Steve Ellcey
sell...@cavium.com



2017-06-12  Steve Ellcey  

* config.gcc (aarch64*-*-linux*): Enable IFUNC by default.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index a311cd95..e4caca4 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -974,6 +974,7 @@ aarch64*-*-freebsd*)
tmake_file="${tmake_file} aarch64/t-aarch64 aarch64/t-aarch64-freebsd"
;;
 aarch64*-*-linux*)
+   default_gnu_indirect_function=yes
tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h"
tm_file="${tm_file} aarch64/aarch64-elf.h aarch64/aarch64-linux.h"
tmake_file="${tmake_file} aarch64/t-aarch64 aarch64/t-aarch64-linux"

Re: [PATCH/AARCH64] Improve aarch64 conditional compare usage

2017-06-16 Thread Steve Ellcey


https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00021.html

Ping.

Steve Ellcey
sell...@cavium.com

1 2 3 4 5 6 7 >

1 - 100 of 625 matches

Mail list logo