RFC: lra-constraints.c and TARGET_HARD_REGNO_CALL_PART_CLOBBERED question/patch
I have a reload/register allocation question and possible patch. While working on the Aarch64 SIMD ABI[1] I ran into a problem where GCC was saving and restoring registers that it did not need to. I tracked it down to lra-constraints.c and its use of targetm.hard_regno_call_part_clobbered on instructions that are not calls. Specifically need_for_call_save_p would check this macro even when the instruction in question (unknown to need_for_call_save_p) was not a call instruction. This seems wrong to me and I was wondering if anyone more familiar with the register allocator and reload could look at this patch and tell me if it seems reasonable or not. It passed bootstrap and I am running tests now. I am just wondering if there is any reason why this target function would need to be called on non-call instructions or if doing so is just an oversight/bug. Steve Ellcey sell...@cavium.com [1] https://gcc.gnu.org/ml/gcc/2018-07/msg00012.html 2018-07-11 Steve Ellcey * lra-constraints.c (need_for_call_save_p): Add insn argument and only check targetm.hard_regno_call_part_clobbered on calls. (need_for_split_p): Add insn argument, pass to need_for_call_save_p. (split_reg): Pass insn to need_for_call_save_p. (split_if_necessary): Pass curr_insn to need_for_split_p. (inherit_in_ebb): Ditto. diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index 7eeec76..7fc8e7f 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -5344,7 +5344,7 @@ inherit_reload_reg (bool def_p, int original_regno, /* Return true if we need a caller save/restore for pseudo REGNO which was assigned to a hard register. */ static inline bool -need_for_call_save_p (int regno) +need_for_call_save_p (int regno, rtx_insn *insn) { lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0); return (usage_insns[regno].calls_num < calls_num @@ -5354,7 +5354,7 @@ need_for_call_save_p (int regno) ? lra_reg_info[regno].actual_call_used_reg_set : call_used_reg_set, PSEUDO_REGNO_MODE (regno), reg_renumber[regno]) - || (targetm.hard_regno_call_part_clobbered + || (CALL_P (insn) && targetm.hard_regno_call_part_clobbered (reg_renumber[regno], PSEUDO_REGNO_MODE (regno); } @@ -5374,7 +5374,8 @@ static bitmap_head ebb_global_regs; assignment pass because of too many generated moves which will be probably removed in the undo pass. */ static inline bool -need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno) +need_for_split_p (HARD_REG_SET potential_reload_hard_regs, + int regno, rtx_insn *insn) { int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno]; @@ -5416,7 +5417,8 @@ need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno) || (regno >= FIRST_PSEUDO_REGISTER && lra_reg_info[regno].nrefs > 3 && bitmap_bit_p (&ebb_global_regs, regno - || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno))); + || (regno >= FIRST_PSEUDO_REGISTER + && need_for_call_save_p (regno, insn))); } /* Return class for the split pseudo created from original pseudo with @@ -5536,7 +5538,7 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn, nregs = hard_regno_nregs (hard_regno, mode); rclass = lra_get_allocno_class (original_regno); original_reg = regno_reg_rtx[original_regno]; - call_save_p = need_for_call_save_p (original_regno); + call_save_p = need_for_call_save_p (original_regno, insn); } lra_assert (hard_regno >= 0); if (lra_dump_file != NULL) @@ -5759,7 +5761,7 @@ split_if_necessary (int regno, machine_mode mode, && INSN_UID (next_usage_insns) < max_uid) || (GET_CODE (next_usage_insns) == INSN_LIST && (INSN_UID (XEXP (next_usage_insns, 0)) < max_uid))) - && need_for_split_p (potential_reload_hard_regs, regno + i) + && need_for_split_p (potential_reload_hard_regs, regno + i, insn) && split_reg (before_p, regno + i, insn, next_usage_insns, NULL)) res = true; return res; @@ -6529,7 +6531,8 @@ inherit_in_ebb (rtx_insn *head, rtx_insn *tail) && usage_insns[j].check == curr_usage_insns_check && (next_usage_insns = usage_insns[j].insns) != NULL_RTX) { - if (need_for_split_p (potential_reload_hard_regs, j)) + if (need_for_split_p (potential_reload_hard_regs, j, + curr_insn)) { if (lra_dump_file != NULL && head_p) {
Re: RFC: lra-constraints.c and TARGET_HARD_REGNO_CALL_PART_CLOBBERED question/patch
On Thu, 2018-07-12 at 07:17 +0100, Richard Sandiford wrote: > > So it only calls targetm.hard_regno_call_part_clobbered if such a > call is known to exist somewhere between the two references to > regno (although we don't have the calls themselves to hand). > > Thanks, > Richard Having the specific calls is the problem here because the normal ABI and the SIMD ABI are going to have different values for caller_save and part_clobbered. But I think I have found a better way to address this. Looking at 'need_for_call_save_p', when 'flag_ipa_ra' is set, we look at 'actual_call_used_reg_set' to see if we need to save a register. But even if a particular register doesn't show up as used, if 'hard_regno_call_part_clobbered' is set for that register we are going to return true and say we need a save/restore of the register. On Aarch64, if I 'really' didn't use the register I shouldn't need to save/restore it. If I used it but it is callee saved then on 'normal' functions I may need to save/restore it (to protect the upper bits) but on SIMD functions I do not need to save/restore it at all because the callee will save/restore the entire register and not just the lower 64 bits. I think that the patch I want to create is: when 'flag_ipa_ra' is true, remove the check of 'hard_regno_call_part_clobbered' from 'need_for_call_save_p'. Instead, check 'hard_regno_call_part_clobbered' in 'process_bb_lives' (where we know exactly what function is being called) and use that to set 'actual_call_used_reg_set'. In process_bb_live we know exactly what function we are calling and can check to see if it is a 'normal' function or a SIMD function. Steve Ellcey sell...@cavium.com
RFC: Patch to implement Aarch64 SIMD ABI
This is a patch to support the Aarch64 SIMD ABI [1] in GCC. I intend to eventually follow this up with two more patches; one to define the TARGET_SIMD_CLONE* macros and one to improve the GCC register allocation/usage when calling SIMD functions. The significant difference between the standard ARM ABI and the SIMD ABI is that in the normal ABI a callee saves only the lower 64 bits of registers V8-V15, in the SIMD ABI the callee must save all 128 bits of registers V8-V23. This patch checks for SIMD functions and saves the extra registers when needed. It does not change the caller behavour, so with just this patch there may be values saved by both the caller and callee. This is not efficient, but it is correct code. This patch bootstraps and passes the GCC testsuite but that only verifies I haven't broken anything, it doesn't validate the handling of SIMD functions. I tried to write some tests, but I could never get GCC to generate code that would save the FP callee-save registers in the prologue. Complex code might generate spills and fills but it never triggered the prologue/epilogue code to save V8-V23. If anyone has ideas on how to write a test that would cause GCC to generate this code I would appreciate some ideas. Just doing lots of calculations with lots of intermediate values doesn't seem to be enough. Steve Ellcey sell...@cavium.com [1] https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi 2018-07-18 Steve Ellcey * config/aarch64/aarch64.c (aarch64_attribute_table): New array. (aarch64_simd_function_p): New function. (aarch64_layout_frame): Check for simd function. (aarch64_process_components): Ditto. (aarch64_expand_prologue): Ditto. (aarch64_expand_epilogue): Ditto. (TARGET_ATTRIBUTE_TABLE): New define. * config/aarch64/aarch64.h (FP_SIMD_SAVED_REGNUM_P): New define. * config/aarch64/aarch64.md (V23_REGNUM) New constant.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1369704..b25da11 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1026,6 +1026,15 @@ static const struct processor *selected_tune; /* The current tuning set. */ struct tune_params aarch64_tune_params = generic_tunings; +/* Table of machine attributes. */ +static const struct attribute_spec aarch64_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "aarch64_vector_pcs", 0, 0, true, false, false, false, NULL, NULL }, + { NULL, 0, 0, false, false, false, false, NULL, NULL } +}; + #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0) /* An ISA extension in the co-processor and main instruction set space. */ @@ -1404,6 +1413,18 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) return false; } +/* Return true if this is a definition of a vectorized simd function. */ + +static bool +aarch64_simd_function_p (tree fndecl) +{ + if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL) +return true; + if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL) +return false; + return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl; +} + /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The callee only saves the lower 64 bits of a 128-bit register. Tell the compiler the callee clobbers the top 64 bits when restoring the bottom 64 bits. */ @@ -4034,6 +4055,7 @@ aarch64_layout_frame (void) { HOST_WIDE_INT offset = 0; int regno, last_fp_reg = INVALID_REGNUM; + bool simd_function = aarch64_simd_function_p (cfun->decl); if (reload_completed && cfun->machine->frame.laid_out) return; @@ -4068,7 +4090,8 @@ aarch64_layout_frame (void) for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++) if (df_regs_ever_live_p (regno) - && !call_used_regs[regno]) + && (!call_used_regs[regno] + || (simd_function && FP_SIMD_SAVED_REGNUM_P (regno { cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED; last_fp_reg = regno; @@ -4105,7 +4128,8 @@ aarch64_layout_frame (void) { /* If there is an alignment gap between integer and fp callee-saves, allocate the last fp register to it if possible. */ - if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0) + if (regno == last_fp_reg && has_align_gap + && !simd_function && (offset & 8) == 0) { cfun->machine->frame.reg_offset[regno] = max_int_offset; break; @@ -4117,7 +4141,7 @@ aarch64_layout_frame (void) else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM && cfun->machine->frame.wb_candidate1 >= V0_REGNUM) cfun->mach
Re: RFC: Patch to implement Aarch64 SIMD ABI
On Thu, 2018-07-19 at 08:31 +0100, Richard Sandiford wrote: > > > @@ -4706,8 +4730,11 @@ aarch64_process_components (sbitmap > > components, bool prologue_p) > > while (regno != last_regno) > > { > > /* AAPCS64 section 5.1.2 requires only the bottom 64 bits to be saved > > - so DFmode for the vector registers is enough. */ > > - machine_mode mode = GP_REGNUM_P (regno) ? E_DImode : E_DFmode; > > + so DFmode for the vector registers is enough. For simd functions > > + we want to save the entire register. */ > > + machine_mode mode = GP_REGNUM_P (regno) ? E_DImode > > + : (aarch64_simd_function_p (cfun->decl) ? E_TFmode : E_DFmode); > This condition also occurs in aarch64_push_regs and aarch64_pop_regs. > It'd probably be worth splitting it out into a subfunction. > > I think you also need to handle the writeback cases, which should work > for Q registers too. This will mean extra loadwb_pair and storewb_pair > patterns. > > LGTM otherwise FWIW. Yes, I see where I missed this in aarch64_push_regs and aarch64_pop_regs. I think that is why the second of Wilco's two examples (f2) is wrong. I am unclear about exactly what is meant by writeback and why we have it and how that and callee_adjust are used. Any chance someone could help me understand this part of the prologue/epilogue code better? The comments in aarch64.c/aarch64.h aren't really helping me understand what the code is doing or why it is doing it. Steve Ellcey sell...@cavium.com
Re: RFC: Patch to implement Aarch64 SIMD ABI
On Fri, 2018-07-20 at 11:11 +, Wilco Dijkstra wrote: > Steve Ellcey wrote: > > > Yes, I see where I missed this in aarch64_push_regs > > and aarch64_pop_regs. I think that is why the second of > > Wilco's two examples (f2) is wrong. I am unclear about > > exactly what is meant by writeback and why we have it and > > how that and callee_adjust are used. Any chance someone > > could help me understand this part of the prologue/epilogue > > code better? The comments in aarch64.c/aarch64.h aren't > > really helping me understand what the code is doing or > > why it is doing it. > Writeback is the same as a base update in a load or store. When > creating the frame there are 3 stack adjustments to be made: > creating stack for locals, pushing callee-saves and reserving space > for outgoing arguments. We merge these stack adjustments as much as > possible and use load/store with writeback for codesize and performance. > See the last part in layout_frame for the different cases. OK, I think I understand this a bit better now. I think my main problem is with the term 'writeback' which I am not used to seeing. But if I understand things correctly we are saving one or two registers and (possibly) updating the stack pointer using auto-increment/auto- decrement in one instruction and that the updating of SP is what you mean by 'writeback'. Correct? Steve Ellcey sell...@cavium.com
Re: RFC: Patch to implement Aarch64 SIMD ABI
Here is an updated version of my patch for the Aarch64 SIMD ABI. I think the writeback register saves are correct now and I improved the register allocation by defining REG_ALLOC_ORDER. I also added clobbers to expand_call when calling a non-SIMD function from a SIMD function. I am still testing but any feedback on what I have so far would be helpful. Steve Ellcey sell...@cavium.com 2018-07-23 Steve Ellcey * config/aarch64/aarch64.c (aarch64_attribute_table): New array. (aarch64_simd_decl_p): New function. (aarch64_reg_save_mode): New function. (aarch64_is_simd_call_p): New function. (aarch64_layout_frame): Check for simd function. (aarch64_gen_storewb_pair): Handle E_TFmode. (aarch64_push_regs): Use aarch64_reg_save_mode to get mode. (aarch64_gen_loadwb_pair): Handle E_TFmode. (aarch64_pop_regs): Use aarch64_reg_save_mode to get mode. (aarch64_components_for_bb): Check for simd function. (aarch64_process_components): Ditto. (aarch64_expand_prologue): Ditto. (aarch64_expand_epilogue): Ditto. (aarch64_expand_call): Ditto. (TARGET_ATTRIBUTE_TABLE): New define. * config/aarch64/aarch64.h (REG_ALLOC_ORDER): New define. (HONOR_REG_ALLOC_ORDER): Ditto. (FP_SIMD_SAVED_REGNUM_P): Ditto. * config/aarch64/aarch64.md (V23_REGNUM) New constant. (loadwb_pair_): New instruction. ("storewb_pair_): Ditto.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1369704..d7557e2 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1026,6 +1026,15 @@ static const struct processor *selected_tune; /* The current tuning set. */ struct tune_params aarch64_tune_params = generic_tunings; +/* Table of machine attributes. */ +static const struct attribute_spec aarch64_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "aarch64_vector_pcs", 0, 0, true, false, false, false, NULL, NULL }, + { NULL, 0, 0, false, false, false, false, NULL, NULL } +}; + #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0) /* An ISA extension in the co-processor and main instruction set space. */ @@ -1404,6 +1413,26 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) return false; } +/* Return true if this is a definition of a vectorized simd function. */ + +static bool +aarch64_simd_decl_p (tree fndecl) +{ + if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL) +return true; + if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL) +return false; + return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl; +} + +static +machine_mode aarch64_reg_save_mode (tree fndecl, unsigned regno) +{ + return GP_REGNUM_P (regno) + ? E_DImode + : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode); +} + /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The callee only saves the lower 64 bits of a 128-bit register. Tell the compiler the callee clobbers the top 64 bits when restoring the bottom 64 bits. */ @@ -1498,6 +1527,13 @@ aarch64_is_noplt_call_p (rtx sym) return false; } +static bool +aarch64_is_simd_call_p (rtx sym) +{ + tree decl = SYMBOL_REF_DECL (sym); + return decl && aarch64_simd_decl_p (decl); +} + /* Return true if the offsets to a zero/sign-extract operation represent an expression that matches an extend operation. The operands represent the paramters from @@ -4034,6 +4070,7 @@ aarch64_layout_frame (void) { HOST_WIDE_INT offset = 0; int regno, last_fp_reg = INVALID_REGNUM; + bool simd_function = aarch64_simd_decl_p (cfun->decl); if (reload_completed && cfun->machine->frame.laid_out) return; @@ -4068,7 +4105,8 @@ aarch64_layout_frame (void) for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++) if (df_regs_ever_live_p (regno) - && !call_used_regs[regno]) + && (!call_used_regs[regno] + || (simd_function && FP_SIMD_SAVED_REGNUM_P (regno { cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED; last_fp_reg = regno; @@ -4105,7 +4143,8 @@ aarch64_layout_frame (void) { /* If there is an alignment gap between integer and fp callee-saves, allocate the last fp register to it if possible. */ - if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0) + if (regno == last_fp_reg && has_align_gap + && !simd_function && (offset & 8) == 0) { cfun->machine->frame.reg_offset[regno] = max_int_offset; break; @@ -4117,7 +4156,7 @@ aarch64_layout_frame (void) else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM && cfun->machine->
[Patch] [Aarch64] PR 86538 - Define __ARM_FEATURE_LSE if LSE is available
This is a patch for PR 86538, to define an __ARM_FEATURE_LSE macro when LSE is available. Richard Earnshaw closed PR 86538 as WONTFIX because the ACLE (Arm C Language Extension) does not require this macro and because he is concerned that it might encourage people to use inline assembly instead of the __sync and atomic intrinsics. (See actual comments in the defect report.) While I agree that we want people to use the intrinsics I still think there are use cases where people may want to know if LSE is available or not and there is currrently no (simple) way to determine if this feature is available since it can be turned or and off independently of the architecture used. Also, as a general principle, I think any feature that can be toggled on or off by the compiler should provide a way for users to determine what its state is. So what do other ARM maintainers and users think? Is this a useful feature to have in GCC? Steve Ellcey sell...@cavium.com 2018-07-24 Steve Ellcey PR target/86538 * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add define of __ARM_FEATURE_LSE. diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c index 40c738c..e057ba9 100644 --- a/gcc/config/aarch64/aarch64-c.c +++ b/gcc/config/aarch64/aarch64-c.c @@ -154,6 +154,9 @@ aarch64_update_cpp_builtins (cpp_reader *pfile) aarch64_def_or_undef (TARGET_SM4, "__ARM_FEATURE_SM4", pfile); aarch64_def_or_undef (TARGET_F16FML, "__ARM_FEATURE_FP16_FML", pfile); + /* This is not required by ACLE, but it is useful. */ + aarch64_def_or_undef (TARGET_LSE, "__ARM_FEATURE_LSE", pfile); + /* Not for ACLE, but required to keep "float.h" correct if we switch target between implementations that do or do not support ARMv8.2-A 16-bit floating-point extensions. */
Re: [Patch] [Aarch64] PR 86538 - Define __ARM_FEATURE_LSE if LSE is available
On Tue, 2018-07-24 at 22:04 +0100, James Greenhalgh wrote: > > > I'd say this patch isn't desirable for trunk. I'd be interested in use cases > that need a static decision on presence of LSE that are not better expressed > using higher level language features. > > Thanks, > James How about when building the higher level features? Right now, in sysdeps/aarch64/atomic-machine.h, we hardcode ATOMIC_EXCHANGE_USES_CAS to 0. If we had __ARM_FEATURE_LSE we could use that to determine if we wanted to set ATOMIC_EXCHANGE_USES_CAS to 0 or 1 which would affect the call generated in nptl/pthread_spin_lock.c. That would be useful if we built a lipthread specifically for a platform that had LSE. Steve Ellcey sell...@cavium.com
Re: RFC: Patch to implement Aarch64 SIMD ABI
Here is version 3 of my patch to implement the SIMD ABI on Aarch64. I am having a problem with how to handle a SIMD function calling a non-SIMD function. When this happens the SIMD function needs to save V8 to V23 because it cannot count on the non-SIMD function to save all 128 bits of these registers. I thought I had this working in the last patch but as I write test cases, it appears that it is not working and I am not sure how to implement it. I tried adding clobbers in aarch64_expand_call but that is not working (see code in this patch in aarch64_expand_call). If I add them to 'call' which is a parallel insn, they are ignored. If I find the underlying call instruction that is part of the parallel then the clobbers get added to the instruction but then the call itself is not recognized with the extra clobbers in place. I don't think we want to add new call instructions in aarch64.md to handle the vector register saves and restores. Am I trying to add the clobbers in the wrong place? Where and when should extra clobbers be added to a call that is going to clobber more registers than what is indicated by CALL_USED_REGISTERS? I suppose I could use TARGET_HARD_REGNO_CALL_PART_CLOBBERED but I would have to extend it to include the call instruction as an argument so the the code could determine if the call being made was to a simd or non-simd function. Steve Ellcey sell...@cavium.com 2018-07-25 Steve Ellcey * config/aarch64/aarch64.c (aarch64_attribute_table): New array. (aarch64_simd_decl_p): New function. (aarch64_reg_save_mode): New function. (aarch64_is_simd_call_p): New function. (aarch64_function_ok_for_sibcall): Check for simd calls. (aarch64_layout_frame): Check for simd function. (aarch64_gen_storewb_pair): Handle E_TFmode. (aarch64_push_regs): Use aarch64_reg_save_mode to get mode. (aarch64_gen_loadwb_pair): Handle E_TFmode. (aarch64_pop_regs): Use aarch64_reg_save_mode to get mode. (aarch64_components_for_bb): Check for simd function. (aarch64_process_components): Ditto. (aarch64_expand_prologue): Ditto. (aarch64_expand_epilogue): Ditto. (aarch64_expand_call): Ditto. (TARGET_ATTRIBUTE_TABLE): New define. * config/aarch64/aarch64.h (REG_ALLOC_ORDER): New define. (HONOR_REG_ALLOC_ORDER): Ditto. (FP_SIMD_SAVED_REGNUM_P): Ditto. * config/aarch64/aarch64.md (V23_REGNUM) New constant. (loadwb_pair_): New instruction. ("storewb_pair_): Ditto.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index fa01475..cc642f5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1027,6 +1027,15 @@ static const struct processor *selected_tune; /* The current tuning set. */ struct tune_params aarch64_tune_params = generic_tunings; +/* Table of machine attributes. */ +static const struct attribute_spec aarch64_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "aarch64_vector_pcs", 0, 0, true, false, false, false, NULL, NULL }, + { NULL, 0, 0, false, false, false, false, NULL, NULL } +}; + #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0) /* An ISA extension in the co-processor and main instruction set space. */ @@ -1405,6 +1414,26 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) return false; } +/* Return true if this is a definition of a vectorized simd function. */ + +static bool +aarch64_simd_decl_p (tree fndecl) +{ + if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL) +return true; + if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL) +return false; + return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl; +} + +static +machine_mode aarch64_reg_save_mode (tree fndecl, unsigned regno) +{ + return GP_REGNUM_P (regno) + ? E_DImode + : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode); +} + /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The callee only saves the lower 64 bits of a 128-bit register. Tell the compiler the callee clobbers the top 64 bits when restoring the bottom 64 bits. */ @@ -1499,6 +1528,13 @@ aarch64_is_noplt_call_p (rtx sym) return false; } +static bool +aarch64_is_simd_call_p (rtx sym) +{ + tree decl = SYMBOL_REF_DECL (sym); + return decl && aarch64_simd_decl_p (decl); +} + /* Return true if the offsets to a zero/sign-extract operation represent an expression that matches an extend operation. The operands represent the paramters from @@ -3269,10 +3305,11 @@ aarch64_split_sve_subreg_move (rtx dest, rtx ptrue, rtx src) } static bool -aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED, - tree exp ATTRIBUTE_
[Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute
Here is a new version of my patch to support the Aarch64 SIMD ABI [1] in GCC. I think this is complete enought to be considered for check in. I wrote a few new tests and put them in a new gcc.target/torture directory so they would be run with multiple optimization options. I also verified that there are no regressions in the GCC testsuite. The significant difference between the standard ARM ABI and the SIMD ABI is that in the normal ABI a callee saves only the lower 64 bits of registers V8-V15, in the SIMD ABI the callee must save all 128 bits of registers V8-V23. As I mentioned in my RFC, I intend to (eventually) follow this patch with two more, one to define the TARGET_SIMD_CLONE* macros and one to improve the GCC register allocation/usage when calling SIMD functions. Right now, a caller calling a SIMD function will save more registers than it needs to because some of those registers will also be saved by the callee. Steve Ellcey sell...@cavium.com [1] https://developer.arm.com/products/software-development-tools/hpc/a rm-compiler-for-hpc/vector-function-abi Compiler ChangeLog: 2018-07-31 Steve Ellcey * config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p): New prototype. (aarch64_epilogue_uses): Ditto. * config/aarch64/aarch64.c (aarch64_attribute_table): New array. (aarch64_simd_decl_p): New function. (aarch64_reg_save_mode): New function. (aarch64_is_simd_call_p): New function. (aarch64_function_ok_for_sibcall): Check for simd calls. (aarch64_layout_frame): Check for simd function. (aarch64_gen_storewb_pair): Handle E_TFmode. (aarch64_push_regs): Use aarch64_reg_save_mode to get mode. (aarch64_gen_loadwb_pair): Handle E_TFmode. (aarch64_pop_regs): Use aarch64_reg_save_mode to get mode. (aarch64_gen_store_pair): Handle E_TFmode. (aarch64_gen_load_pair): Ditto. (aarch64_save_callee_saves): Handle different mode sizes. (aarch64_restore_callee_saves): Ditto. (aarch64_components_for_bb): Check for simd function. (aarch64_epilogue_uses): New function. (aarch64_process_components): Ditto. (aarch64_expand_prologue): Ditto. (aarch64_expand_epilogue): Ditto. (aarch64_expand_call): Ditto. (TARGET_ATTRIBUTE_TABLE): New define. * config/aarch64/aarch64.h (EPILOGUE_USES): Redefine. (FP_SIMD_SAVED_REGNUM_P): New macro. * config/aarch64/aarch64.md (V23_REGNUM) New constant. (simple_return): New define_expand. (load_pair_dw_tftf): New instruction. (store_pair_dw_tftf): Ditto. (loadwb_pair_): Ditto. ("storewb_pair_): Ditto. Testsuite ChangeLog: 2018-07-31 Steve Ellcey * gcc.target/aarch64/torture/aarch64-torture.exp: New file. * gcc.target/aarch64/torture/simd-abi-1.c: New test. * gcc.target/aarch64/torture/simd-abi-2.c: Ditto. * gcc.target/aarch64/torture/simd-abi-3.c: Ditto. * gcc.target/aarch64/torture/simd-abi-4.c: Ditto.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index af5db9c..99c962f 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx); bool aarch64_symbolic_address_p (rtx); bool aarch64_uimm12_shift (HOST_WIDE_INT); bool aarch64_use_return_insn_p (void); +bool aarch64_use_simple_return_insn_p (void); const char *aarch64_mangle_builtin_type (const_tree); const char *aarch64_output_casesi (rtx *); @@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx); /* Check for a legitimate floating point constant for FMOV. */ bool aarch64_float_const_representable_p (rtx); +extern int aarch64_epilogue_uses (int); + #if defined (RTX_CODE) void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode, rtx label_ref); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index fa01475..9e6827a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1027,6 +1027,15 @@ static const struct processor *selected_tune; /* The current tuning set. */ struct tune_params aarch64_tune_params = generic_tunings; +/* Table of machine attributes. */ +static const struct attribute_spec aarch64_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "aarch64_vector_pcs", 0, 0, true, false, false, false, NULL, NULL }, + { NULL, 0, 0, false, false, false, false, NULL, NULL } +}; + #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0) /* An ISA extension in the co-processor and main instruction set space. */ @@ -1405,6 +1414,26 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) return false; } +/* Return true if this is
[patch, libtool] Top-level libtool.m4 patch for autoconf 2.68
Some of the binutils directories (libgloss and newlib) have moved from autoconf 2.64 to 2.68. While running autoconf 2.68 on these directories I get some warnings coming from libtool.m4 which is at the top level of the GCC and binutils trees. Fixes for these warnings are already in the upstream libtool sources and the changes work fine with autoconf 2.64 as well as autoconf 2.68 so I would like to put these fixes into the libtool.m4 at the GCC and binutils top-levels. I ran autoconf in various GCC and binutils directories to verify that no signficant changes ocurred in any directories, is this patch OK to check in to the GCC tree? I will send seperate email to the binutils mailing list though perhaps permission to check it in to the GCC tree is sufficient to put it in binutils as well? Steve Ellcey sell...@mips.com 2013-08-20 Steve Ellcey * libtool.m4 (_LT_SYS_MODULE_PATH_AIX): Put AC_LANG_PROGRAM call in brackets. (irix*|nonstopux*): use AC_LANG_SOURCE in AC_LINK_IFELSE call. diff --git a/libtool.m4 b/libtool.m4 index 8a14e2b..7119b35 100644 --- a/libtool.m4 +++ b/libtool.m4 @@ -1079,7 +1079,7 @@ m4_defun([_LT_DARWIN_LINKER_FEATURES], # to the aix ld manual. m4_defun([_LT_SYS_MODULE_PATH_AIX], [m4_require([_LT_DECL_SED])dnl -AC_LINK_IFELSE(AC_LANG_PROGRAM,[ +AC_LINK_IFELSE([AC_LANG_PROGRAM],[ lt_aix_libpath_sed=' /Import File Strings/,/^$/ { /^0/ { @@ -4926,7 +4926,16 @@ _LT_EOF # implicitly export all symbols. save_LDFLAGS="$LDFLAGS" LDFLAGS="$LDFLAGS -shared ${wl}-exported_symbol ${wl}foo ${wl}-update_registry ${wl}/dev/null" -AC_LINK_IFELSE(int foo(void) {}, + AC_LINK_IFELSE( + [AC_LANG_SOURCE( +[AC_LANG_CASE([C], [[int foo (void) { return 0; }]], + [C++], [[int foo (void) { return 0; }]], + [Fortran 77], [[ + subroutine foo + end]], + [Fortran], [[ + subroutine foo + end]])])], _LT_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && func_echo_all "${wl}-set_version ${wl}$verstring"` ${wl}-update_registry ${wl}${output_objdir}/so_locations ${wl}-exports_file ${wl}$export_symbols -o $lib' ) LDFLAGS="$save_LDFLAGS"
Re: [PATCH] Fix PR57451 (Incorrect debug ranges emitted for -freorder-blocks-and-partition -g)
On Mon, 2013-08-19 at 22:21 -0700, Teresa Johnson wrote: > 2013-08-19 Teresa Johnson > > PR rtl-optimizations/57451 > * final.c (reemit_insn_block_notes): Prevent lexical blocks > from crossing split section boundaries. > > * testsuite/g++.dg/tree-prof/pr57451.C: New test. Teresa, This patch is causing a problem in my mips compiler. I am creating a cross-toolchain running on x86 targeting mips-mti-linux-gnu and if I compile glibc using a GCC that has this patch all of the programs I compile go into an infinite loop when run under the qemu simulator. I think it is the dynamic linker that is getting miscompiled, but I haven't tracked down the exact nature of the miscompilation yet. Has anyone else reported any problems with it? I am adding Richard Sandiford to the email list since he is a mips expert and also might have some insight to using next_active_insn vs. next_real_insn vs. next_insn, though it doesn't look like you have changed what is used here. Steve Ellcey sell...@mips.com
Re: [PATCH] Fix PR57451 (Incorrect debug ranges emitted for -freorder-blocks-and-partition -g)
On Fri, 2013-08-23 at 07:38 -0700, Teresa Johnson wrote: > > It should be committed soon, by Kaz or myself once testing finishes. > > Please let me know if that doesn't fix your failures. > > Fix committed as r201941. Let me know if you still have issues. > > Teresa This patch fixes the problem for me. Thanks Teresa, Thanks Kaz. Steve Ellcey
Re: [patch, libgfortran, configure] Cross-compile support for libgfortran
On Mon, 2013-09-23 at 16:26 +0100, Marcus Shawcroft wrote: > On 4 June 2013 20:49, Steve Ellcey wrote: > > This patch allows me to build libgfortran for a cross-compiling toolchain > > using newlib. Currently the checks done by AC_CHECK_FUNCS_ONCE fail with > > my toolchain because the compile/link fails due to the configure script not > > using the needed linker script in the link command. The check for > > with_newlib > > is how libjava deals with the problem and it fixes my build problems. > > > > My only concern is defining HAVE_STRTOLD, strtold exists in my newlib but > > I am not sure if that is true for all newlib builds. I didn't see any > > flags that I could easily use to check for long double support in the > > libgfortran configure.ac, but it seems to assume that the type exists. > > > > OK to checkin? > > This patch breaks systems where long double is wider than double. The > newlib implementation provides strtold for systems where double is as > wide as long double but not on systems where long double is wider. > > I don;t see any issues with AC_CHECK_FUNC_ONCE when cross configuring > aarch64 toolchains but I would have thought that fixing the link test > issue you encountered would be preferable to hard coding assumptions > in the configure script? > > Cheers > /Marcus AC_CHECK_FUNC_ONCE may work for aarch64 but I don't think there is anyway to make it work generally for all platforms. In the libjava configure.ac file is the comments: [dnl Botheration. Now we've got to detect the exception model. dnl Link tests against libgcc.a are problematic since -- at least dnl as of this writing -- we've not been given proper -L bits for dnl single-tree newlib and libgloss. and: # We are being configured with a cross compiler. AC_REPLACE_FUNCS # may not work correctly, because the compiler may not be able to # link executables. The libstdc++ configure has comments to the same effect: # This lets us hard-code the functionality we know we'll have in the cross # target environment. "Let" is a sugar-coated word placed on an especially # dull and tedious hack, actually. # # Here's why GLIBCXX_CHECK_MATH_SUPPORT, and other autoconf macros # that involve linking, can't be used: #"cannot open sim-crt0.o" #"cannot open crt0.o" # etc. All this is because there currently exists no unified, consistent # way for top level CC information to be passed down to target directories: # newlib includes, newlib linking info, libgloss versus newlib crt0.o, etc. # When all of that is done, all of this hokey, excessive AC_DEFINE junk for # crosses can be removed. The libstdc++ configure file has something that looks like it might be intended to address the problem of long double support (long_double_math_on_this_cpu) but I don't see where that variable is ever set in any configure file even though it is used in the libstdc++ configure file. Steve Ellcey sell...@mips.com
Re: RFA: Store the REG_BR_PROB probability directly as an int
On Tue, 2013-09-24 at 21:07 +0200, Andreas Schwab wrote: > Richard Sandiford writes: > > > Sorry for the breakage. I think we need to handle INT_LIST in the same way > > as INSN_LIST though, and eliminate in XEXP (x, 1). > > > > How about the attached? Testing in progress... > > Works for me as well. > > Andreas. This patch worked for me as well on MIPS. I did a complete build and test overnight. Steve Ellcey sell...@mips.com
Re: [PATCH] Fix libgfortran cross compile configury w.r.t newlib
On Thu, 2013-09-26 at 14:47 +0100, Marcus Shawcroft wrote: > I'm in two minds about whether further sticky tape of this form is the > right approach or whether the original patch should be reverted until a > proper fix that does not regress the tree can be found. > > Thoughts? > > 2013-09-26 Marcus Shawcroft > > * configure.ac (AC_CHECK_FUNCS_ONCE): Make if statement > dependent on gcc_no_link. > > Cheers > /Marcus I think this patch is a good fix. I (obviously) don't favor reverting the previous patch because that would re-break the Fortran build on MIPS bare-metal cross compilers (or any compiler where a linker script is needed). Any 'proper' fix should address libstdc++, libjava, and other libraries as well as libgfortran and I don't know what a cleaner fix would be. In fact I would say the other libraries should consider using this fix. The only reason they don't run into this problem too is that they don't depend on any long double functions or any other functions that are optionally built by newlib. I will test this patch on my targets and make sure it works for me, but I don't see why it would not. Steve Ellcey sell...@mips.com
Re: [PATCH] Fix libgfortran cross compile configury w.r.t newlib
On Thu, 2013-09-26 at 14:47 +0100, Marcus Shawcroft wrote: > I'm in two minds about whether further sticky tape of this form is the > right approach or whether the original patch should be reverted until a > proper fix that does not regress the tree can be found. > > Thoughts? > > 2013-09-26 Marcus Shawcroft > > * configure.ac (AC_CHECK_FUNCS_ONCE): Make if statement > dependent on gcc_no_link. > > Cheers > /Marcus Well, I thought this patch would work for me, but it does not. It looks like gcc_no_link is set to 'no' on my target because, technically, I can link even if I don't use a linker script. I just can't find any functions. % cat x.c int main (void) { return 0; } % mips-mti-elf-gcc x.c -o x /local/home/sellcey/nightly/install-mips-mti-elf/lib/gcc/mips-mti-elf/4.9.0/../../../../mips-mti-elf/bin/ld: warning: cannot find entry symbol __start; defaulting to 00400098 % echo $? 0 % cat y.c int main (void) { exit (0); } % install-mips-mti-elf/bin/mips-mti-elf-gcc y.c -o y /local/home/sellcey/nightly/install-mips-mti-elf/lib/gcc/mips-mti-elf/4.9.0/../../../../mips-mti-elf/bin/ld: warning: cannot find entry symbol __start; defaulting to 00400098 /tmp/ccdG78PN.o: In function `main': y.c:(.text+0x14): undefined reference to `exit' collect2: error: ld returned 1 exit status ubuntu-sellcey % echo $? 1 Steve Ellcey sell...@mips.com
Re: [PATCH v2] Fix libgfortran cross compile configury w.r.t newlib
On Tue, 2013-10-01 at 12:40 +0100, Marcus Shawcroft wrote: > On 30/09/13 13:40, Marcus Shawcroft wrote: > > >> Well, I thought this patch would work for me, but it does not. It looks > >> like gcc_no_link is set to 'no' on my target because, technically, I can > >> link even if I don't use a linker script. I just can't find any > >> functions. > >> > > > In which case gating on gcc_no_link could be replaced with a test that > > looks to see if we can link with the library. Perhaps looking for > > exit() or some such that might reasonably be expected to be present. > > > > For example: > > > > AC_CHECK_FUNC(exit) > > if test "x${with_newlib}" = "xyes" -a "x${ac_cv_func_exit}" = "xno"; then > > > > /Marcus This patch works on my mips-mti-elf target. Steve Ellcey sell...@mips.com
Re: [patch, mips] Patch for new mips triplet - mips-mti-elf
On Sun, 2012-09-30 at 19:53 +0100, Richard Sandiford wrote: > Sorry for only noticing now, but this produced: > > ERROR: gcc.target/mips/pr37362.c -O0 : syntax error in target selector > "target ! mips*-sde-elf mips*-mti-elf" for " dg-do 2 compile { target { ! > mips*-sde-elf mips*-mti-elf } } " > ... > > We need another set of braces. Tested on mipsisa64-elf and applied. > > Richard Thanks for fixing this, I am not sure why I didn't notice it in my testing. Steve Ellcey sell...@mips.com
[patch, mips, testsuite] Fix test to handle optimizations
The gcc.target/mips/ext_ins.c was failing in little endian mode on MIPS because the compiler is smart enough now to see that 'c' is uninitialized and it can insert the field 'a' into 'c' with a shift and a full store instead of an insert because the store just overwrites unintialized data. I changed the code to force the compiler to preserve the other fields of 'c' and that makes it use the insert instruction in both big and little endian modes. Tested on mips-mti-elf. OK to checkin? Steve Ellcey sell...@mips.com 2012-10-08 Steve Ellcey * gcc.target/ext_ins.c: Modify f2 to aviod uninitialized data. diff --git a/gcc/testsuite/gcc.target/mips/ext_ins.c b/gcc/testsuite/gcc.target/mips/ext_ins.c index f0169bc..36f0f3f 100644 --- a/gcc/testsuite/gcc.target/mips/ext_ins.c +++ b/gcc/testsuite/gcc.target/mips/ext_ins.c @@ -18,9 +18,8 @@ NOMIPS16 unsigned int f1 (struct A a) return a.j; } -NOMIPS16 void f2 (int i) +NOMIPS16 struct A f2 (struct A a, int i) { - struct A c; - c.j = i; - func (c); + a.j = i; + return a; }
[patch, mips, testsuite] Fix gcc.target/mips/octeon-bbit-2.c for -Os
The gcc.target/octeon-bbit-2.c is failing with -Os because that optimization level does not do whichever optimization it is that results in a bbit instead of a bbit[01]l. I would like to skip this test for -Os the way it already gets skipped for -O0. Tested on mips-mti-elf. Ok for checkin? Steve Ellcey sell...@mips.com 2012-10-08 Steve Ellcey * gcc.target/octeon-bbit-2.c: Skip for -Os optimization level. diff --git a/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c b/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c index 9bd8dce..7d88d68 100644 --- a/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c +++ b/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-march=octeon -mbranch-likely -fno-unroll-loops" } */ -/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */ +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-Os" } { "" } } */ /* { dg-final { scan-assembler "\tbbit\[01\]\t" } } */ /* { dg-final { scan-assembler-not "\tbbit\[01\]l\t" } } */ /* { dg-final { scan-assembler "\tbnel\t" } } */
Re: [patch, mips, testsuite] Fix gcc.target/mips/octeon-bbit-2.c for -Os
On Mon, 2012-10-08 at 11:09 -0700, Mike Stump wrote: > On Oct 8, 2012, at 9:21 AM, Steve Ellcey wrote: > > The gcc.target/octeon-bbit-2.c is failing with -Os because that optimization > > level does not do whichever optimization it is that results in a bbit > > instead > > of a bbit[01]l. I would like to skip this test for -Os the way it already > > gets > > skipped for -O0. > > > > Tested on mips-mti-elf. Ok for checkin? > > Ideally I'd like a mips expert to weigh in on this. The issue is, is the > code smaller with the other instruction? > If so, is there a reasonable way to obtain that type of win more often in the > port with -Os? Now, if you are that > mips expert, that's fine, but, trivially you don't need my approval to check > it in. If the code is larger, > trivially, the patch is ok. If the optimization generally hurt code size and > can't be made to win, the patch is ok. > If always the same size, it would seem ok. I just don't have the mips > specific background to know which case this > is. Well, I checked -O1, -O2 and -Os. The -Os code is smaller then -O1 but larger then -O2. I didn't dig deep enough to find out exactly which optimization is causing the change in instruction usage. Perhaps Richard Sandiford will have an opinion on this change. Steve Ellcey sell...@mips.com
[patch, mips] Modify mips-mti-linux-gnu build
This patch makes a few changes to the work I have been doing on the mips-mti-linux-gnu target that I recently added. It adds an -mabi=64 build to MULTILIBS and changes the order of MULTILIB_OPTIONS to be consistent with the mips-mti-elf target. It also makes the N32 ABI the default when compiling for mips64* targets, this is also to be consistent with the mips-mti-elf target. Tested with a cross-build and running the GCC testsuite using qemu. OK to checkin? Steve Ellcey sell...@mips.com 2012-10-30 Steve Ellcey * config/mips/mti-linux.h (SYSROOT_SUFFIX_SPEC): Change order and add mabi=64. (DRIVER_SELF_SPECS): Make -n32 the default on mips64* archs. * config/mips/t-mti-linux (MULTILIB_OPTIONS): Change order. (MULTILIB_DIRNAMES): Ditto. (MULTILIB_EXCEPTIONS): New. diff --git a/gcc/config/mips/mti-linux.h b/gcc/config/mips/mti-linux.h index 36c003c..cda9bdc 100644 --- a/gcc/config/mips/mti-linux.h +++ b/gcc/config/mips/mti-linux.h @@ -21,7 +21,7 @@ along with GCC; see the file COPYING3. If not see /* This target is a multilib target, specify the sysroot paths. */ #undef SYSROOT_SUFFIX_SPEC #define SYSROOT_SUFFIX_SPEC \ - "%{mips32:/mips32}%{mips64:/mips64}%{mips64r2:/mips64r2}%{msoft-float:/sof}%{mel|EL:/el}%{mabi=64:/64}%{mabi=n32:/n32}" + "%{mips32:/mips32}%{mips64:/mips64}%{mips64r2:/mips64r2}%{mabi=64:/64}%{mel|EL:/el}%{msoft-float:/sof}" #undef DRIVER_SELF_SPECS #define DRIVER_SELF_SPECS \ @@ -36,6 +36,10 @@ along with GCC; see the file COPYING3. If not see /* Infer the -msynci setting from -march if not explicitly set. */ \ MIPS_ISA_SYNCI_SPEC, \ \ + /* If no ABI option is specified, infer one from the ISA level \ + or -mgp setting. */ \ + "%{!mabi=*: %{" MIPS_32BIT_OPTION_SPEC ": -mabi=32;: -mabi=n32}}", \ + \ /* Base SPECs. */ \ BASE_DRIVER_SELF_SPECS \ \ diff --git a/gcc/config/mips/t-mti-linux b/gcc/config/mips/t-mti-linux index ba11706..6d280cd 100644 --- a/gcc/config/mips/t-mti-linux +++ b/gcc/config/mips/t-mti-linux @@ -19,6 +19,15 @@ # The default build is mips32r2, hard-float big-endian. Add mips32, # soft-float, and little-endian variations. -MULTILIB_OPTIONS = mips32/mips64/mips64r2 msoft-float EL -MULTILIB_DIRNAMES = mips32 mips64 mips64r2 sof el +MULTILIB_OPTIONS = mips32/mips64/mips64r2 mabi=64 EL msoft-float +MULTILIB_DIRNAMES = mips32 mips64 mips64r2 64 el sof MULTILIB_MATCHES = EL=mel EB=meb + +# The 64 bit ABI is not supported on the mips32 architecture. +MULTILIB_EXCEPTIONS += *mips32*/*mabi=64* + +# The 64 bit ABI is not supported on the mips32r2 architecture. +# Because mips32r2 is the default we can't use that flag to trigger +# the exception so we check for mabi=64 with no specific mips flag +# instead. +MULTILIB_EXCEPTIONS += mabi=64*
Re: [PATCH][i386]Fix PR 57756
On Thu, 2013-10-17 at 06:03 -0700, Diego Novillo wrote: > On Wed, Oct 16, 2013 at 6:06 PM, David Edelsohn wrote: > > > How is Google going to change its patch commit policies to ensure that > > this does not happen again? > > There is nothing to change. Google follows > http://gcc.gnu.org/contribute.html, like everyone else. Sri just fixed > the oversight and if there is any other fallout from his patch, he > will address it. > > I don't see anything out of the ordinary here. > > > Diego. FYI: I just want to make sure everyone is aware that there are still broken targets. rs6000 may be fixed but mips still doesn't work and I presume other platforms like sparc are also still broken. /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c: In function 'void cpp_atomic_builtins(cpp_reader*)': /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:594:7: error: 'target_flags_explicit' was not declared in this scope /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:606:7: error: 'target_flags_explicit' was not declared in this scope /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:618:7: error: 'target_flags_explicit' was not declared in this scope /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:630:7: error: 'target_flags_explicit' was not declared in this scope make[1]: *** [c-family/c-cppbuiltin.o] Error 1 Sriraman, are you looking at how to fix this globally or are the target maintainers expected to fix this? Currently I am using this one line patch to get things building, but I presume this is not what we want long term. % git diff opth-gen.awk diff --git a/gcc/opth-gen.awk b/gcc/opth-gen.awk index 01c5ab6..46bd570 100644 --- a/gcc/opth-gen.awk +++ b/gcc/opth-gen.awk @@ -114,6 +114,7 @@ print "};" print "extern struct gcc_options global_options;" print "extern const struct gcc_options global_options_init;" print "extern struct gcc_options global_options_set;" +print "#define target_flags_explicit global_options_set.x_target_flag s" print "#endif" print "#endif" print "" Steve Ellcey sell...@mips.com
Re: [PATCH][i386]Fix PR 57756
On Thu, 2013-10-17 at 11:06 -0700, Xinliang David Li wrote: > JBG, Steve, Can you help testing Sri's latest patch on your targets? > This will help speed up the process. > > thanks, > > David I have already tested the one line patch to opth-gen.awk. It fixes the MIPS build and a testsuite run looked OK too. Steve Ellcey sell...@mips.com
[patch, mips] Fix optimization bug involving nor instruction
While looking at some MIPS code I found that GCC was not making optimal use of the MIPS nor instruction. If you compile this function: int f (int a, int b) { return ~(a|b); } It generates: or $2,$4,$5 nor $2,$0,$2 instead of just: nor $2,$4,$5 The problem is that mips_rtx_costs does not know that (AND (NOT op1) (NOT op2)) can be done in a single operation so it calculates the cost of the expression as the cost of 3 operations, this results in combine thinking that (NOT (OR op1 op2) is cheaper. This patch changes mips_rtx_costs to change the cost calculation of a nor to be the cost of one operation (plus the cost of getting the operands into registers) instead of the cost of three operations. Tested with no regressions, OK to checkin? Steve Ellcey sell...@mips.com 2013-10-22 Steve Ellcey * config/mips/mips.c (mips_rtx_costs): Fix cost estimate for nor (AND (NOT OP1) (NOT OP2)). 2013-10-22 Steve Ellcey * gcc.target/mips/nor.c: New. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 5993aab..ffb0b53 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -3796,6 +3796,18 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED, return true; } } + /* (AND (NOT op0) (NOT op1) is a nor operation that can be done in +a single instruction. */ + if (!TARGET_MIPS16 && (GET_CODE (XEXP (x, 0)) == NOT) + && (GET_CODE (XEXP (x, 1)) == NOT)) + { + rtx op0 = XEXP (x, 0); + rtx op1 = XEXP (x, 1); + cost = GET_MODE_SIZE (mode) > UNITS_PER_WORD ? 2 : 1; + *total = COSTS_N_INSNS (cost) + set_src_cost (XEXP (op0, 0), speed) + + rtx_cost (XEXP (op1, 0), GET_CODE (op1), 1, speed); + return true; + } /* Fall through. */ diff --git a/gcc/testsuite/gcc.target/mips/nor.c b/gcc/testsuite/gcc.target/mips/nor.c new file mode 100644 index 000..e71791b --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/nor.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */ +/* { dg-final { scan-assembler-times "\tnor\t" 1 } } */ +/* { dg-final { scan-assembler-not "\tor" } } */ + +/* Test that we generate a 'nor' instruction and no 'or' instructions. */ + +NOMIPS16 int f (int a, int b) +{ + return ~(a|b); +}
Re: [patch, mips] Fix optimization bug involving nor instruction
On Tue, 2013-10-22 at 18:34 +0100, Richard Sandiford wrote: > Good spot! > We should use set_src_cost for both operands (the idea being that only > a register is allowed, and that anything else will end up being a SET_SRC). > I think the formatting should be something like: > > /* (AND (NOT op0) (NOT op1)) is a NOR operation that can be done in >a single instruction. */ > if (!TARGET_MIPS16 > && GET_CODE (XEXP (x, 0)) == NOT > && GET_CODE (XEXP (x, 1)) == NOT) > { > cost = GET_MODE_SIZE (mode) > UNITS_PER_WORD ? 2 : 1; > *total = (COSTS_N_INSNS (cost) > + set_src_cost (XEXP (XEXP (x, 0), 0), speed) > + set_src_cost (XEXP (XEXP (x, 1), 0), speed)); > return true; > } > > OK with that change, thanks. > > Richard OK, but I am curious why you put parenthesis around the right hand side of the total expression. I.e. *total = (); Steve Ellcey
Re: [patch, mips] Fix optimization bug involving nor instruction
On Tue, 2013-10-22 at 19:12 +0100, Richard Sandiford wrote: > >> Richard > > > > OK, but I am curious why you put parenthesis around the right hand side > > of the total expression. I.e. *total = (); > > That's the "emacs formatting" rule: OK, I have checked in the patch with the parenthesis (and your other changes). Steve
Re: [PATCH v2] Fix libgfortran cross compile configury w.r.t newlib
On Thu, 2013-10-24 at 15:53 +0100, Marcus Shawcroft wrote: > Steve, > > Can your build be fixed allowing us to back out: > http://gcc.gnu.org/ml/fortran/2013-06/msg00038.html > > ? > > I'd really like to make some progress on this, while my proposed patch > does resolve the regression introduced by the above patch I am > concerned that this is going in the wrong direction and that we > should, as Mike suggests above fix the build issue such that autoconf > behaves, rather than attempting to hardwire configure details of > newlib into libgfortran... > > /Marcus I am not sure how we would fix the build issue to allow us to not hardcode the newlib configure details into the libgfortran configure script. The linker script that needs to be used to get a good link is different depending on what options one is compiling with on a multilib MIPS system and I have no idea on how one could create (or use) a dummy linker script. Ideally, I think we would want a check that does not depend on linking at all. Note that this problem is not just in libgfortran, it affects libstdc++ and libjava too and those configure scripts also have hardcoded the newlib assumptions. The only difference between libgfortran and the other two libraries is that the newlib assumptions for libgfortran are not static like they are for libstdc++ and libjava. They vary (for one function) based on whether or not long double is supported. If we want to come up with a better solution it should be used for all the libraries and not just for libgfortran. Steve Ellcey sell...@mips.com
[Patch] Fix canadian cross build on systems with no fenv.h
I ran into a build problem while doing a canadian cross build of GCC. I was building on linux to create a Windows (mingw) GCC that generates code for mips-mti-elf. The mips-mti-elf target uses newlib for its system headers and libraries and the headers do not include a fenv.h header file. However, when doing a canadian cross build libstdc++ is built with the C++ compiler that runs on linux (the build system), not the C++ that was just built for Windows (the host system). The problem is that the libstdc++ configure script (in GLIBCXX_CHECK_C99_TR1) is checking for the fenv.h using C++ (not C) and the C++ compiler running on linux does have an fenv.h header because the latest libstdc++ builds always create this header for C++ regardless of whether there is one on the system or not. This results in the newly created libstdc++ library having a fenv.h header that tries to include the system fenv.h header file which does not exist and the build fails. My fix for this is to explicitly check for fenv.h and complex.h in the libstdc++ configure.ac script before calling GLIBCXX_CHECK_C99_TR1. This makes the configure script check for these headers using the C compiler instead of the C++ compiler and when GLIBCXX_CHECK_C99_TR1 is run it uses that information (saved in a autoconf variable) to correctly ascertain that fenv.h does not exist. I could put these checks in GLIBCXX_CHECK_C99_TR1 if that were considered preferable, it would just have to be done before we call 'AC_LANG_CPLUSPLUS'. Tested with both my canadian cross build and a standard cross build targetting mips-mti-elf. OK for checkin? Steve Ellcey sell...@mips.com 2013-10-30 Steve Ellcey * configure.ac: Add header checks for fenv.h and complex.h. * configure: Regenerate. diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac index dd13b01..22fc840 100644 --- a/libstdc++-v3/configure.ac +++ b/libstdc++-v3/configure.ac @@ -195,6 +195,12 @@ GLIBCXX_CHECK_S_ISREG_OR_S_IFREG AC_CHECK_HEADERS(sys/uio.h) GLIBCXX_CHECK_WRITEV +# Check for fenv.h and complex.h before GLIBCXX_CHECK_C99_TR1 +# so that the check is done with the C compiler (not C++). +# Checking with C++ can break a canadian cross build if either +# file does not exist in C but does in C++. +AC_CHECK_HEADERS(fenv.h complex.h) + # For C99 support to TR1. GLIBCXX_CHECK_C99_TR1
[Patch] Fix type of clock_t in timevar.c
While working on a canadian cross build I ran into a problem with the type of clock_t. If HAVE_CLOCK_T is not defined timevar.c defines it to be int. I think the type should be long. I am using the mingw compilers on linux in my canadian cross build and HAVE_CLOCK_T is not getting set but when building I get a conflict between the definition of clock_t in /usr/i586-mingw32msvc/include/time.h (long) and the definition in timevar.c. Since mingw and linux and glibc and newlib all seem to agree on long I would like to change timevar.c to agree with them. Tested with my canadian cross build and a native x86 linux build. OK to checkin? Steve Ellcey sell...@mips.com 2013-11-01 Steve Ellcey * timevar.c: Fix type of clock_t. diff --git a/gcc/timevar.c b/gcc/timevar.c index 23b7118..b66f94a 100644 --- a/gcc/timevar.c +++ b/gcc/timevar.c @@ -23,7 +23,7 @@ along with GCC; see the file COPYING3. If not see #include "timevar.h" #ifndef HAVE_CLOCK_T -typedef int clock_t; +typedef long clock_t; #endif #ifndef HAVE_STRUCT_TMS
[Patch] configure patch for caddr_t and ssize_t types
While doing a canadian cross build I ran into a problem with the caddr_t type. configure.ac is using an obsolete version of AC_CHECK_TYPE to create #define's of caddr_t and ssize_t if they are not defined by the system. In addition to using an obsolete version of AC_CHECK_TYPE this was causing a problem in my build because caddr_t was also getting set via a typedef by the mingw compilers. Here is my fix, tested with a canadian cross build and with a native x86 linux build. OK to checkin? 2013-11-01 Steve Ellcey * configure.ac: Add header checks for fenv.h and complex.h. * configure: Regenerate. * config.in: Regnerate. * system.h: Add default caddr_t and ssize_t typedefs. diff --git a/gcc/configure.ac b/gcc/configure.ac index 5e686db..bc87073 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -1075,8 +1075,8 @@ int main() fi fi -AC_CHECK_TYPE(ssize_t, int) -AC_CHECK_TYPE(caddr_t, char *) +AC_CHECK_TYPES([ssize_t]) +AC_CHECK_TYPES([caddr_t]) GCC_AC_FUNC_MMAP_BLACKLIST diff --git a/gcc/system.h b/gcc/system.h index a1fc6de..108ec0e 100644 --- a/gcc/system.h +++ b/gcc/system.h @@ -1060,6 +1060,14 @@ helper_const_non_const_cast (const char *p) #define DEBUG_VARIABLE #endif +#ifndef HAVE_CADDR_T +typedef char *caddr_t; +#endif + +#ifndef HAVE_SSIZE_T +typedef int ssize_t +#endif + /* Get definitions of HOST_WIDE_INT and HOST_WIDEST_INT. */ #include "hwint.h"
Re: [Patch] configure patch for caddr_t and ssize_t types
On Fri, 2013-11-01 at 19:22 +0100, Marek Polacek wrote: > On Fri, Nov 01, 2013 at 11:15:02AM -0700, Steve Ellcey wrote: > > --- a/gcc/system.h > > +++ b/gcc/system.h > > @@ -1060,6 +1060,14 @@ helper_const_non_const_cast (const char *p) > > #define DEBUG_VARIABLE > > #endif > > > > +#ifndef HAVE_CADDR_T > > +typedef char *caddr_t; > > +#endif > > + > > +#ifndef HAVE_SSIZE_T > > +typedef int ssize_t > > +#endif > > Missing ';'? > > Marek Yes. Both my cross build and my native build set HAVE_SSIZE_T so I didn't see this. I really only changed ssize_t to make it consistent with caddr_t (which was causing me a problem) and to remove the use of the obsolete form of AC_CHECK_TYPE. Steve Ellcey sell...@mips.com
Re: [Patch] Fix type of clock_t in timevar.c
On Fri, 2013-11-01 at 12:43 -0700, Mike Stump wrote: > On Nov 1, 2013, at 10:47 AM, Steve Ellcey wrote: > > While working on a canadian cross build I ran into a problem with the > > type of clock_t. If HAVE_CLOCK_T is not defined > > > timevar.c defines it to be int. I think the type should be long. I am > > using the mingw > > compilers on linux in my canadian cross build and HAVE_CLOCK_T is not > > getting set but when building I get a conflict between the definition > > of clock_t in /usr/i586-mingw32msvc/include/time.h (long) and the > > definition in timevar.c. > > You should report a bug to them and have them define clock_t. They are defining clock_t, but for some reason the GCC configure is not seeing it (perhaps because of what header files get included). > > Since mingw and linux and glibc and newlib all seem to agree on long > > I would like to change timevar.c to agree with them. > > Seems reasonable to me, though newlib and macosx use unsigned long. glibc > remains at long, due to binary compatibility. Ah, I confused _CLOCK_T_ (unsigned long) with __CLOCK_T_TYPE (long) in newlib. clock_t is defined as _CLOCK_T_ in newlib, not __CLOCK_T_TYPE. Steve Ellcey sell...@mips.com
Re: [Patch] Fix type of clock_t in timevar.c
On Fri, 2013-11-01 at 13:45 -0700, Mike Stump wrote: > On Nov 1, 2013, at 12:56 PM, Steve Ellcey wrote: > >> You should report a bug to them and have them define clock_t. > > > > They are defining clock_t, but for some reason the GCC configure is not > > seeing it (perhaps because of what header files get included). > > Curious, do you have sys/time.h? If so, that's likely the cause, as system.h > does this: > > # if HAVE_SYS_TIME_H > # include > # else > # ifdef HAVE_TIME_H > # include > # endif > # endif > > thus cleverly avoiding even including time.h when it exists. I suspect > TIME_WITH_SYS_TIME failed for some silly reason, though, can't guess why. > autoconf is fun when it fails. Yes, mingw does have sys/time.h, but their sys/time.h includes so it should work. And now when I try to reproduce the problem it seems to work. I am getting HAVE_CLOCK_T defined in auto-build.h and auto-host.h. Steve Ellcey sell...@mips.com
[Patch, MIPS] Patch for PR 68400, a mips16 bug
Here is a patch for PR6400. The problem is that and_operands_ok was checking one operand to see if it was a memory_operand but MIPS16 addressing is more restrictive than what the general memory_operand allows. The fix was to call mips_classify_address if TARGET_MIPS16 is set because it will do a more complete mips16 addressing check and reject operands that do not meet the more restrictive requirements. I ran the GCC testsuite with no regressions and have included a test case as part of this patch. OK to checkin? Steve Ellcey sell...@imgtec.com 2016-01-26 Steve Ellcey PR target/68400 * config/mips/mips.c (and_operands_ok): Add MIPS16 check. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index dd54d6a..adeafa3 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -8006,9 +8006,18 @@ mask_low_and_shift_p (machine_mode mode, rtx mask, rtx shift, int maxlen) bool and_operands_ok (machine_mode mode, rtx op1, rtx op2) { - return (memory_operand (op1, mode) - ? and_load_operand (op2, mode) - : and_reg_operand (op2, mode)); + + if (memory_operand (op1, mode)) +{ + if (TARGET_MIPS16) { + struct mips_address_info addr; + if (!mips_classify_address (&addr, op1, mode, false)) + return false; + } + return and_load_operand (op2, mode); +} + else +return and_reg_operand (op2, mode); } /* The canonical form of a mask-low-and-shift-left operation is 2016-01-26 Steve Ellcey PR target/68400 * gcc.target/mips/mips.exp (mips_option_groups): Add stack-protector. * gcc.target/mips/pr68400.c: New test. diff --git a/gcc/testsuite/gcc.target/mips/mips.exp b/gcc/testsuite/gcc.target/mips/mips.exp index f191331..ff9c99a 100644 --- a/gcc/testsuite/gcc.target/mips/mips.exp +++ b/gcc/testsuite/gcc.target/mips/mips.exp @@ -257,6 +257,7 @@ set mips_option_groups { lsa "(|!)HAS_LSA" section_start "-Wl,--section-start=.*" frame-header "-mframe-header-opt|-mno-frame-header-opt" +stack-protector "-fstack-protector" } for { set option 0 } { $option < 32 } { incr option } { diff --git a/gcc/testsuite/gcc.target/mips/pr68400.c b/gcc/testsuite/gcc.target/mips/pr68400.c index e69de29..1099568 100644 --- a/gcc/testsuite/gcc.target/mips/pr68400.c +++ b/gcc/testsuite/gcc.target/mips/pr68400.c @@ -0,0 +1,28 @@ +/* PR target/pr68400 + This was triggering an ICE in change_address_1 when compiled with -Os. */ + +/* { dg-do compile } */ +/* { dg-options "-fstack-protector -mips16" } */ + +typedef struct s { + unsigned long long d; + long long t; +} p; + +int sh(int x, unsigned char *buf) +{ + p *uhdr = (p *)buf; + unsigned int i = 0; + uhdr->d = ((uhdr->d & 0xff00LL) >> 56) +| ((uhdr->d & 0xff00LL) >> 24) +| ((uhdr->d & 0xff00LL) << 8) +| ((uhdr->d & 0x00ffLL) << 56); + uhdr->t = ((uhdr->t & 0xff00LL) >> 56) +| ((uhdr->t & 0xff00LL) >> 24) +| ((uhdr->t & 0x00ffLL) >> 8) +| ((uhdr->t & 0xff00LL) << 8) +| ((uhdr->t & 0xff00LL) << 40) +| ((uhdr->t & 0x00ffLL) << 56); + i += 4; + if (x < i) return 0; else return 1; +}
Re: [PATCH] libcpp: use better locations for _Pragma tokens (preprocessor/69126)
David, This patch has broken the top-of-tree glibc build. I get an error on an undef that is supposed to be protected by Pragmas. It happens when compiling intl/plural.c in glibc. I created a preprocessed source file but when I compile it, the error doesn't happen. I will try to create a small unpreprocessed standalone test case but maybe this is enough for you to know what is happening. If I look at the preprocessed code, the statement in question is under ingore warning pragmas. #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wuninitialized" #pragma GCC diagnostic ignored "-Wmaybe-uninitialized" *++yyvsp = __gettextlval; #pragma GCC diagnostic pop Steve Ellcey sell...@imgtec.com plural.c: In function '__gettextparse': plural.c:1767:12: error: '__gettextlval.num' may be used uninitialized in this function [-Werror=maybe-uninitialized] *++yyvsp = yylval; plural.c:66:25: note: '__gettextlval.num' was declared here #define yylval __gettextlval ^ plural.c:1265:9: note: in expansion of macro 'yylval' YYSTYPE yylval YY_INITIAL_VALUE(yyval_default); ^~ cc1: all warnings being treated as errors
Re: [PATCH] libcpp: use better locations for _Pragma tokens (preprocessor/69126)
On Thu, 2016-01-28 at 14:59 -0500, David Malcolm wrote: > On Thu, 2016-01-28 at 14:48 -0500, David Malcolm wrote: > > On Thu, 2016-01-28 at 11:12 -0800, Steve Ellcey wrote: > > > David, > > > > > > This patch has broken the top-of-tree glibc build > > > > Bother; sorry. FWIW, I'm about to get on a plane (for FOSDEM), so > > I'm > > not in a great position to tackle this yet. > > [...] > > I've filed PR preprocessor/69543 to cover this regression: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69543 > > Should the patches be reverted in the meantime?(r232893 and r232928) I can live with the problem for a few days, so guess it depends on when you think you will have time to look at it more. Steve Ellcey
[Patch, MIPS] Fix PR target/68273, passing args in wrong regs
This is a patch for PR 68273, where MIPS is passing arguments in the wrong registers. The problem is that when passing an argument by value, mips_function_arg_boundary was looking at the type of the argument (and not just the mode), and if that argument was a variable with extra alignment info (say an integer variable with 8 byte alignment), MIPS was aligning the argument on an 8 byte boundary instead of a 4 byte boundary the way it should. Since we are passing values (not variables), the alignment of the variable that the value is copied from should not affect the alignment of the value being passed. This patch fixes the problem and it could change what registers arguments are passed in, which means it could affect backwards compatibility with older programs. But since the current behaviour is not compliant with the MIPS ABI and does not match what LLVM does, I think we have to make this change. For the most part this will only affect arguments which are copied from variables that have non-standard alignments set by the aligned attribute, however the SRA optimization pass can create aligned variables as it splits aggregates apart and that was what triggered this bug report. This is basically the same bug as the ARM bug PR 65956 and the fix is pretty much the same too. Rather than create MIPS specific tests that check the use of specific registers I created two tests to put in gcc.c-torture/execute that were failing before because GCC on MIPS was not consistent in where arguments were passed and which now work with this patch. Tested with mips-mti-linux-gnu and no regressions. OK to checkin? Steve Ellcey sell...@imgtec.com 2016-01-29 Steve Ellcey PR target/68273 * config/mips/mips.c (mips_function_arg_boundary): Fix argument alignment. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index dd54d6a..ecce3cd 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -5643,8 +5643,9 @@ static unsigned int mips_function_arg_boundary (machine_mode mode, const_tree type) { unsigned int alignment; - - alignment = type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode); + alignment = type && mode == BLKmode + ? TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) + : GET_MODE_ALIGNMENT (mode); if (alignment < PARM_BOUNDARY) alignment = PARM_BOUNDARY; if (alignment > STACK_BOUNDARY) 2016-01-29 Steve Ellcey PR target/68273 * gcc.c-torture/execute/pr68273-1.c: New test. * gcc.c-torture/execute/pr68273-2.c: New test. diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c b/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c index e69de29..3ce07c6 100644 --- a/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c +++ b/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c @@ -0,0 +1,74 @@ +/* Make sure that the alignment attribute on an argument passed by + value does not affect the calling convention and what registers + arguments are passed in. */ + +extern void exit (int); +extern void abort (void); + +typedef int alignedint __attribute__((aligned(8))); + +int __attribute__((noinline)) +foo1 (int a, alignedint b) +{ return a + b; } + +int __attribute__((noinline)) +foo2 (int a, int b) +{ + return a + b; +} + +int __attribute__((noinline)) +bar1 (alignedint x) +{ + return foo1 (1, x); +} + +int __attribute__((noinline)) +bar2 (alignedint x) +{ + return foo1 (1, (alignedint) 99); +} + +int __attribute__((noinline)) +bar3 (alignedint x) +{ + return foo1 (1, x + (alignedint) 1); +} + +alignedint q = 77; + +int __attribute__((noinline)) +bar4 (alignedint x) +{ + return foo1 (1, q); +} + + +int __attribute__((noinline)) +bar5 (alignedint x) +{ + return foo2 (1, x); +} + +int __attribute__((noinline)) +use_arg_regs (int i, int j, int k) +{ + return i+j-k; +} + +int main() +{ + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (foo1 (19,13) != 32) abort (); + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (bar1 (-33) != -32) abort (); + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (bar2 (1) != 100) abort (); + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (bar3 (17) != 19) abort (); + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (bar4 (-33) != 78) abort (); + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (bar5 (-84) != -83) abort (); + exit (0); +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c b/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c index e69de29..1661be9 100644 --- a/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c +++ b/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c @@ -0,0 +1,109 @@ +/* Make sure that the alignment attribute on an argument passed by + value does not affect the calling convention and what registers + arguments are passed in. */ + +extern void exit (int); +extern void abort (void); + +typedef struct s { + char c; + char d; +} t1; +typedef
Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs
On Sat, 2016-01-30 at 11:06 +, Richard Sandiford wrote: > I'm not sure this patch is safe. The received wisdom used to be that > ABIs should be defined in terms of types, not modes, since types > represent the source code while modes are an internal GCC concept > that could change over time (although in practice the barrier for > that is pretty high). > > The patch that introduced the line being patched was part of a series > that fixed ABI incompatibilities with MIPSpro, and IIRC some of the > incompatibilties really were due to us looking at modes when we should > have been looking at types. My main reason for looking at mode instead of type for non-aggregates was because of argument promotion. When I looked at a function with a char argument, the type was char but the mode was SI. Now I could still use the alignment of char (1 byte) since it would get extended to 4 bytes in mips_function_arg_boundary by: if (alignment < PARM_BOUNDARY) alignment = PARM_BOUNDARY; But it still seemed a bit 'wrong' to me. Maybe something in the front/middle ends should be updating the type to match any argument promotion that is being done? Steve Ellcey sell...@imgtec.com
Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs
On Sat, 2016-01-30 at 11:06 +, Richard Sandiford wrote: > We need to be careful of examples like: > > struct __attribute__ ((aligned (8))) s { _Complex float x; }; > void foo (struct s *ptr, struct s val) { *ptr = val; } > > "x" gets SCmode, which has an alignment of 4. And it's OK for TYPE_MODE > to have a smaller alignment than the type -- it's just not allowed to > have a larger alignment (and even that restriction only applies because > this is a STRICT_ALIGNMENT target). So the structure itself inherits > this SCmode. > > The patch therefore changes how we handle foo() for -mabi=32 -msoft-float. > Before the patch "val" is passed in $6 and $7. After the patch it's > passed in $5 and $6. clang behaves like the unpatched GCC. > > If instead we use: > > struct __attribute__ ((aligned (8))) s { float x; float y; }; > void foo (struct s *ptr, struct s val) { *ptr = val; } > > then the structure has BLKmode and the alignment is honoured both before > and after the patch. > > There's no real ABI reason for handling the two cases differently. > The fact that one gets BLKmode and the other doesn't is down > to GCC internals. > > We also have to be careful about the STRICT_ALIGNMENT thing. > At the moment that's hard-coded to 1 for MIPS, but it's possible that > it could become configurable in future, like it is for aarch64 and > rs6000. !STRICT_ALIGNMENT allows TYPE_MODE to have a larger > alignment than the type, so underaligned structures would get modes > when they didn't previously. That would in turn change how they're > passed as arguments. > > Thanks, > Richard Richard, Can you explain why the GCC internals cause us to get SCmode instead of BLKmode for the example with _Complex? I don't understand that. It seems wrong to me and I don't understand where it is coming from. Steve Ellcey sell...@imgtec.com
Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs
Here is a new patch for PR target/68273. It makes the GCC calling conventions compatible with LLVM so that the two agree with each other in all the cases I could think of testing and it fixes the reported defect. I couldn't get the GCC compatibility test to work (see https://gcc.gnu.org/ml/gcc/2016-02/msg00017.html) so I wasn't able to use that for compatibility testing, instead I hand examined routines with various argument types (structures, ints, complex, etc) to see what registers GCC and LLVM were accessing. This change does introduce an ABI/compatibility change with GCC itself and that is obviously a concern. Basically, any type that is passed by value and has an external alignment applied to it may get passed in different registers because we strip off that alignment (via TYPE_MAIN_VARIANT) before determining the alignment of the variable. If we don't want to break the GCC compatibility then we will continue to have an incompatibility with LLVM and we will need to find another way to deal with the aligned int variable that SRA is creating and passing to a function that expects a 'normal' integer. One thought I had was that we could compare TYPE_ALIGN (type) and TYPE_ALIGN (TYPE_MAIN_VARIANT (type) and if they are different, issue a warning during compilation about a possible incompatibility with older objects. Steve Ellcey sell...@imgtec.com 2016-02-03 Steve Ellcey PR target/68273 * config/mips/mips.c (mips_function_arg_boundary): Fix argument alignment. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 697abc2..4aa215f 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -5644,7 +5644,10 @@ mips_function_arg_boundary (machine_mode mode, const_tree type) { unsigned int alignment; - alignment = type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode); + alignment = type + ? TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) + : GET_MODE_ALIGNMENT (mode); + if (alignment < PARM_BOUNDARY) alignment = PARM_BOUNDARY; if (alignment > STACK_BOUNDARY) 2016-02-03 Steve Ellcey PR target/68273 * gcc.c-torture/execute/pr68273-1.c: New test. * gcc.c-torture/execute/pr68273-2.c: New test. diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c b/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c index e69de29..3ce07c6 100644 --- a/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c +++ b/gcc/testsuite/gcc.c-torture/execute/pr68273-1.c @@ -0,0 +1,74 @@ +/* Make sure that the alignment attribute on an argument passed by + value does not affect the calling convention and what registers + arguments are passed in. */ + +extern void exit (int); +extern void abort (void); + +typedef int alignedint __attribute__((aligned(8))); + +int __attribute__((noinline)) +foo1 (int a, alignedint b) +{ return a + b; } + +int __attribute__((noinline)) +foo2 (int a, int b) +{ + return a + b; +} + +int __attribute__((noinline)) +bar1 (alignedint x) +{ + return foo1 (1, x); +} + +int __attribute__((noinline)) +bar2 (alignedint x) +{ + return foo1 (1, (alignedint) 99); +} + +int __attribute__((noinline)) +bar3 (alignedint x) +{ + return foo1 (1, x + (alignedint) 1); +} + +alignedint q = 77; + +int __attribute__((noinline)) +bar4 (alignedint x) +{ + return foo1 (1, q); +} + + +int __attribute__((noinline)) +bar5 (alignedint x) +{ + return foo2 (1, x); +} + +int __attribute__((noinline)) +use_arg_regs (int i, int j, int k) +{ + return i+j-k; +} + +int main() +{ + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (foo1 (19,13) != 32) abort (); + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (bar1 (-33) != -32) abort (); + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (bar2 (1) != 100) abort (); + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (bar3 (17) != 19) abort (); + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (bar4 (-33) != 78) abort (); + if (use_arg_regs (999, 999, 999) != 999) abort (); + if (bar5 (-84) != -83) abort (); + exit (0); +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c b/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c index e69de29..1661be9 100644 --- a/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c +++ b/gcc/testsuite/gcc.c-torture/execute/pr68273-2.c @@ -0,0 +1,109 @@ +/* Make sure that the alignment attribute on an argument passed by + value does not affect the calling convention and what registers + arguments are passed in. */ + +extern void exit (int); +extern void abort (void); + +typedef struct s { + char c; + char d; +} t1; +typedef t1 t1_aligned8 __attribute__((aligned(8))); +typedef t1 t1_aligned16 __attribute__((aligned(16))); +typedef t1 t1_aligned32 __attribute__((aligned(32))); + +int bar1(int a, t1 b) +{ + return a + b.c + b.d; +} + +int bar2(int a, t1_aligned8 b) +{ + return a + b.c + b.d; +} +
Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs
On Thu, 2016-02-04 at 12:09 +0100, Eric Botcazou wrote: > > So this doesn’t fix aarch64, c6x, epiphany, ia64, iq2000, rs6000, rx, sparc, > > tilegx, tilepro or xtensa. > > :-( That’s one of the problems by having each port copy and paste swaths of > > :code from other ports to express the same thing instead of ports sharing > > :just one copy of code. My port is also broken in the same way > > :(currently). > > Yes, fixing a compiler bug by changing the ABI is a no-no, and the argument > of > the compatibility with LLVM has IMO little merit since it's a GCC extension. But it is a GCC extension that is implemented in LLVM. It's just implemented differently there. We either have to change GCC, change LLVM, or live with the difference. In any of these cases I wonder if we should change GCC so that the code below generates warnings. If the two types are going to get passed differently we shouldn't allow calls with one type to call a function expecting the other type to happen with no warning or error. Currently this code does not warn even with -Wall. typedef int alignedint __attribute__((aligned(8))); extern void foo (alignedint a); extern void bar (int); int a; alignedint b; int main() { foo(a); bar(b); } Steve Ellcey sell...@imgtec.com
[Patch, MIPS] Patch for PR 68273 (user aligned variable arguments)
Here is a new patch for PR 68273. The original problem with gsoap has been fixed by changing GCC to not create overly-aligned variables in the SRA pass but the MIPS specific problem of how user-aligned variables are passed to functions remains. Because MIPS GCC is internally inconsistent, it seems like changing the passing convention for user aligned variables on MIPS is the best option, even though this is an ABI change for MIPS. For example: typedef int alignedint __attribute__((aligned(8))); int foo1 (int x, alignedint y) { return x+y; } int foo2 (int x, int y) { return x+y; } int foo3 (int x, alignedint y) { return x+y; } int foo4 (int x, int y) { return x+y; } int splat1(int a, alignedint b) { foo1(a,b); } int splat2(int a, alignedint b) { foo2(a,b); } int splat3(int a, int b) { foo3(a,b); } int splat4(int a, int b) { foo4(a,b); } In this case foo1 and foo3 would expect the second argument to be in register $6, but foo2 and foo4 would exect it in register $5. Likewise splat1 and splat2 would pass the second argument in $6, but splat3 and splat4 would pass it in $5. This means that the call from splat2 to foo2 and the call from splat3 to foo3 would be wrong in that the caller is putting the argument in one register but the callee is getting out of a different register. In none of these cases would GCC give a warning about the inconsistent parameter passing Since this patch could cause a change in the users program, I have added a warning that will be emitted whenever a user passes an aligned type or when a parameter is declared as an aligned type and that alignment could cause a change in how the value is passed. I also added a way to turn that warning off in case the user doesn't want to see it (-mno-warn-aligned-args). I did not add an option to make GCC pass arguments in the old manner as I consider that method of passing arguments to be a bug and I don't think we want to have an option to propogate that incorrect behavior. Steve Ellcey sell...@imgtec.com 2016-02-24 Steve Ellcey PR target/68273 * config/mips/mips.opt (mwarn-aligned-args): New flag. * config/mips/mips.h (mips_args): Add new field. * config/mips/mips.c (mips_internal_function_arg): New function. (mips_function_incoming_arg): New function. (mips_old_function_arg_boundary): New function. (mips_function_arg): Rewrite to use mips_internal_function_arg. (mips_function_arg_boundary): Fix argument alignment. (TARGET_FUNCTION_INCOMING_ARG): New define. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 697abc2..05465c1 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -1124,6 +1124,7 @@ static const struct mips_rtx_cost_data static rtx mips_find_pic_call_symbol (rtx_insn *, rtx, bool); static int mips_register_move_cost (machine_mode, reg_class_t, reg_class_t); +static unsigned int mips_old_function_arg_boundary (machine_mode, const_tree); static unsigned int mips_function_arg_boundary (machine_mode, const_tree); static machine_mode mips_get_reg_raw_mode (int regno); @@ -5459,11 +5460,11 @@ mips_strict_argument_naming (cumulative_args_t ca ATTRIBUTE_UNUSED) return !TARGET_OLDABI; } -/* Implement TARGET_FUNCTION_ARG. */ +/* Used to implement TARGET_FUNCTION_ARG and TARGET_FUNCTION_INCOMING_ARG. */ static rtx -mips_function_arg (cumulative_args_t cum_v, machine_mode mode, - const_tree type, bool named) +mips_internal_function_arg (cumulative_args_t cum_v, machine_mode mode, + const_tree type, bool named) { CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); struct mips_arg_info info; @@ -5586,6 +5587,50 @@ mips_function_arg (cumulative_args_t cum_v, machine_mode mode, return gen_rtx_REG (mode, mips_arg_regno (&info, TARGET_HARD_FLOAT)); } +/* Implement TARGET_FUNCTION_ARG. */ + +static rtx +mips_function_arg (cumulative_args_t cum_v, machine_mode mode, + const_tree type, bool named) +{ + bool doubleword_aligned_p = (mips_function_arg_boundary (mode, type) + > BITS_PER_WORD); + bool old_doubleword_aligned_p = (mips_old_function_arg_boundary (mode, type) + > BITS_PER_WORD); + CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); + + if (doubleword_aligned_p != old_doubleword_aligned_p + && mips_warn_aligned_args && !cum->aligned_arg_warning_given) +{ + warning (0, "argument %d in the call may be passed in a manner incompatible with previous GCC versions", cum->arg_number+1); + cum->aligned_arg_warning_given = 1; +} + + return mips_internal_function_arg (cum_v, mode, type, named); +} + +/* Implement TARGET_FUNCTION_INCOMING_ARG. */ + +static rtx +mips_function_i
RE: [Patch, MIPS] Patch for PR 68273 (user aligned variable arguments)
On Wed, 2016-02-24 at 13:46 -0800, Matthew Fortune wrote: > Thanks for enumerating all the cases. I'd not looked at all of them. I > do agree that we need a fix given the existing inconsistencies. > > One question I have is where does an over aligned argument get pushed > to the stack with the patch in place. I.e. when taking its address, is > the alignment achieved up to the limit of stack alignment or do they now > only get register-size alignment? If the former then the idea that > argument passing is defined as a structure in memory with the first > portion in registers will no longer hold true. Not sure if that is a > problem. Passing arguments on the stack is not affected by this change, except that we may start using memory at a different argument because of changes in the number of registers we use to pass arguments. The extra alignment was already being ignored for arguments passed on the stack and was only affecting register arguments. So we are now more consistent in terms of argument passing looking like a structure. Those structure 'fields' now do not include the user defined alignment for register args or memory args. An example: typedef int alignedint __attribute__((aligned(8))); int foo1 (int a, int b, int c, int d, int e, int x, alignedint y, int z) { return a+b+c+d+e+x+y+z; } e, x, y, and z were being passed at offsets 16, 20, 24, and 28 before this change and after this change. > > diff --git a/gcc/testsuite/gcc.dg/pr48335-2.c > > b/gcc/testsuite/gcc.dg/pr48335-2.c > > index a37c079..f56b6fd 100644 > > --- a/gcc/testsuite/gcc.dg/pr48335-2.c > > +++ b/gcc/testsuite/gcc.dg/pr48335-2.c > > @@ -1,6 +1,7 @@ > > /* PR middle-end/48335 */ > > /* { dg-do compile } */ > > /* { dg-options "-O2 -fno-tree-sra" } */ > > +/* { dg-options "-O2 -fno-tree-sra -mno-warn-aligned-args" { target > > mips*-*-* } } */ > > Presumably this means that both under alignment of 64-bit types and over > alignment of < 64-bit types are affected? The explicit test cases do not > cover an under aligned long long case which would be good to cover. Yes, the under alignment of long long's is affected. I have added a new testcase (pr68273-3.c) to check that. Here is a new version of the patch with the invoke.texi text, the new testcase, and the style changes you pointed out. Steve Ellcey sell...@imgtec.com 2016-02-24 Steve Ellcey PR target/68273 * config/mips/mips.opt (mwarn-aligned-args): New flag. * config/mips/mips.h (mips_args): Add new field. * config/mips/mips.c (mips_internal_function_arg): New function. (mips_function_incoming_arg): New function. (mips_old_function_arg_boundary): New function. (mips_function_arg): Rewrite to use mips_internal_function_arg. (mips_function_arg_boundary): Fix argument alignment. (TARGET_FUNCTION_INCOMING_ARG): New define. * doc/invoke.texi (MIPS Options): Document new flag. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 5af3d1e..ea4733b 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -1124,6 +1124,7 @@ static const struct mips_rtx_cost_data static rtx mips_find_pic_call_symbol (rtx_insn *, rtx, bool); static int mips_register_move_cost (machine_mode, reg_class_t, reg_class_t); +static unsigned int mips_old_function_arg_boundary (machine_mode, const_tree); static unsigned int mips_function_arg_boundary (machine_mode, const_tree); static machine_mode mips_get_reg_raw_mode (int regno); @@ -5459,11 +5460,11 @@ mips_strict_argument_naming (cumulative_args_t ca ATTRIBUTE_UNUSED) return !TARGET_OLDABI; } -/* Implement TARGET_FUNCTION_ARG. */ +/* Used to implement TARGET_FUNCTION_ARG and TARGET_FUNCTION_INCOMING_ARG. */ static rtx -mips_function_arg (cumulative_args_t cum_v, machine_mode mode, - const_tree type, bool named) +mips_internal_function_arg (cumulative_args_t cum_v, machine_mode mode, + const_tree type, bool named) { CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); struct mips_arg_info info; @@ -5586,6 +5587,54 @@ mips_function_arg (cumulative_args_t cum_v, machine_mode mode, return gen_rtx_REG (mode, mips_arg_regno (&info, TARGET_HARD_FLOAT)); } +/* Implement TARGET_FUNCTION_ARG. */ + +static rtx +mips_function_arg (cumulative_args_t cum_v, machine_mode mode, + const_tree type, bool named) +{ + bool doubleword_aligned_p = (mips_function_arg_boundary (mode, type) + > BITS_PER_WORD); + bool old_doubleword_aligned_p = (mips_old_function_arg_boundary (mode, type) + > BITS_PER_WORD); + CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); + + if (doubleword_aligned_p != old_doubl
Re: [PATCH, ia64] [PR target/52731] internal compiler error: in ia64_st_address_bypass_p, at config/ia64/ia64.c:9357
On Fri, 2013-11-29 at 15:01 +0300, Kirill Yukhin wrote: > Hello, > On 20 Nov 18:37, Kirill Yukhin wrote: > > Hello, > > Patch in the bottom fixes PR52731. > > Is it ok for trunk? > Ping? > > -- > Thanks, K OK. Steve Ellcey sell...@mips.com
[Patch] Remove references to non-existent tree-flow.h file
While looking at PR 59335 (plugin doesn't build) I saw the comments about tree-flow.h and tree-flow-inline.h not existing anymore. While these files have been removed there are still some references to them in Makefile.in, doc/tree-ssa.texi, and a couple of source files. This patch removes the references to these now-nonexistent files. OK to checkin? Steve Ellcey sell...@mips.com 2014-01-09 Steve Ellcey * Makefile.in (TREE_FLOW_H): Remove. (TREE_SSA_H): Add files names from tree-flow.h. * doc/tree-ssa.texi (Annotations): Remove reference to tree-flow.h * tree.h: Remove tree-flow.h reference. * hash-table.h: Remove tree-flow.h reference. * tree-ssa-loop-niter.c (dump_affine_iv): Replace tree-flow.h reference with tree-ssa-loop.h. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 459b1ba..8eb4f68 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -929,11 +929,10 @@ CPP_ID_DATA_H = $(CPPLIB_H) $(srcdir)/../libcpp/include/cpp-id-data.h CPP_INTERNAL_H = $(srcdir)/../libcpp/internal.h $(CPP_ID_DATA_H) TREE_DUMP_H = tree-dump.h $(SPLAY_TREE_H) $(DUMPFILE_H) TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H) -TREE_FLOW_H = tree-flow.h tree-flow-inline.h tree-ssa-operands.h \ +TREE_SSA_H = tree-ssa.h tree-ssa-operands.h \ $(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \ $(HASHTAB_H) $(CGRAPH_H) $(IPA_REFERENCE_H) \ tree-ssa-alias.h -TREE_SSA_H = tree-ssa.h $(TREE_FLOW_H) PRETTY_PRINT_H = pretty-print.h $(INPUT_H) $(OBSTACK_H) TREE_PRETTY_PRINT_H = tree-pretty-print.h $(PRETTY_PRINT_H) GIMPLE_PRETTY_PRINT_H = gimple-pretty-print.h $(TREE_PRETTY_PRINT_H) diff --git a/gcc/doc/tree-ssa.texi b/gcc/doc/tree-ssa.texi index 391dba8..e0238bd 100644 --- a/gcc/doc/tree-ssa.texi +++ b/gcc/doc/tree-ssa.texi @@ -53,9 +53,6 @@ variable has aliases. All these attributes are stored in data structures called annotations which are then linked to the field @code{ann} in @code{struct tree_common}. -Presently, we define annotations for variables (@code{var_ann_t}). -Annotations are defined and documented in @file{tree-flow.h}. - @node SSA Operands @section SSA Operands diff --git a/gcc/hash-table.h b/gcc/hash-table.h index 2b04067..034385c 100644 --- a/gcc/hash-table.h +++ b/gcc/hash-table.h @@ -1050,10 +1050,7 @@ hash_table ::end () /* Iterate through the elements of hash_table HTAB, using hash_table <>::iterator ITER, - storing each element in RESULT, which is of type TYPE. - - This macro has this form for compatibility with the - FOR_EACH_HTAB_ELEMENT currently defined in tree-flow.h. */ + storing each element in RESULT, which is of type TYPE. */ #define FOR_EACH_HASH_TABLE_ELEMENT(HTAB, RESULT, TYPE, ITER) \ for ((ITER) = (HTAB).begin (); \ diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 5a10297..7628363 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -1311,7 +1311,7 @@ dump_affine_iv (FILE *file, affine_iv *iv) if EVERY_ITERATION is true, we know the test is executed on every iteration. The results (number of iterations and assumptions as described in - comments at struct tree_niter_desc in tree-flow.h) are stored to NITER. + comments at struct tree_niter_desc in tree-ssa-loop.h) are stored to NITER. Returns false if it fails to determine number of iterations, true if it was determined (possibly with some assumptions). */ diff --git a/gcc/tree.h b/gcc/tree.h index fa79b6f..67454b7 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1114,9 +1114,6 @@ extern void protected_set_expr_location (tree, location_t); the given label expression. */ #define LABEL_EXPR_LABEL(NODE) TREE_OPERAND (LABEL_EXPR_CHECK (NODE), 0) -/* VDEF_EXPR accessors are specified in tree-flow.h, along with the other - accessors for SSA operands. */ - /* CATCH_EXPR accessors. */ #define CATCH_TYPES(NODE) TREE_OPERAND (CATCH_EXPR_CHECK (NODE), 0) #define CATCH_BODY(NODE) TREE_OPERAND (CATCH_EXPR_CHECK (NODE), 1)
[Patch, testsuite, mips] Fix test gcc.dg/delay-slot-1.c for MIPS
The gcc.dg/delay-slot-1.c test is failing for MIPS targets that do not support the 64 bit ABI because it didn't check to see if that support existed before using the -mabi=64 flag. This patch fixes the problem by using the mips64 check. OK to checkin? Steve Ellcey sell...@mips.com 2014-01-09 Steve Ellcey * gcc.dg/delay-slot-1.c: Add check for 64 bit support. diff --git a/gcc/testsuite/gcc.dg/delay-slot-1.c b/gcc/testsuite/gcc.dg/delay-slot-1.c index f3bcd8e..bfc0273 100644 --- a/gcc/testsuite/gcc.dg/delay-slot-1.c +++ b/gcc/testsuite/gcc.dg/delay-slot-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2" } */ -/* { dg-options "-O2 -mabi=64" { target mips-*-linux-* } } */ +/* { dg-options "-O2 -mabi=64" { target { mips*-*-linux* && mips64 } } } */ struct offset_v1 { int k_uniqueness;
[Patch] Fix PR plugin/59335, plugins not compiling
Some header files needed by plugins are not present in the installed plugin include directory. This patch adds those headers. I added all the headers I needed for my plugin plus the ones mentioned in the patch. I did not add the x86 specific *.def files but this should fix all the generic platform issues that are mentioned in the defect. Tested with my switch-shortcut plugin on MIPS. OK to checkin? Steve Ellcey sell...@mips.com 2014-01-09 Steve Ellcey PR plugins/59335 * Makefile.in (PLUGIN_HEADERS): Add gimplify.h, gimple-iterator.h, gimple-ssa.h, fold-const.h, tree-cfg.h, tree-into-ssa.h, tree-ssanames.h, print-tree.h, varasm.h, and context.h. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 8eb4f68..76766ba 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3118,7 +3118,9 @@ PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ cppdefault.h flags.h $(MD5_H) params.def params.h prefix.h tree-inline.h \ $(GIMPLE_PRETTY_PRINT_H) realmpfr.h \ $(IPA_PROP_H) $(TARGET_H) $(RTL_H) $(TM_P_H) $(CFGLOOP_H) $(EMIT_RTL_H) \ - version.h stringpool.h + version.h stringpool.h gimplify.h gimple-iterator.h gimple-ssa.h \ + fold-const.h tree-cfg.h tree-into-ssa.h tree-ssanames.h print-tree.h \ + varasm.h context.h # generate the 'build fragment' b-header-vars s-header-vars: Makefile
[Patch, MIPS] Fix SYSROOT_SUFFIX_SPEC for mips-mti-linux-gnu
This patch enables builds with mips[32|64]r3 and mips[32|64]r5 in the mips-mti-linux-gnu toolchain. t-mti-linux uses MULTILIB_MATCHES to map these to r2 but SYSROOT_SUFFIX_SPEC was not being set properly to find the sysroot (the r2 one) for these architectures. This patch fixes that problem by updating MIPS_SYSVERSION_SPEC which is used by SYSROOT_SUFFIX_SPEC. It will only affect the mips-mti-linux-gnu toolchain. Tested with mips-mti-linux-gnu, OK to checkin? Steve Ellcey sell...@imgtec.com 2015-07-09 Steve Ellcey * config/mips/mti-linux.h (MIPS_SYSVERSION_SPEC): Update to handle mips[32|64]r3 and mips[32|64]r5. diff --git a/gcc/config/mips/mti-linux.h b/gcc/config/mips/mti-linux.h index 03d1baa..b497625 100644 --- a/gcc/config/mips/mti-linux.h +++ b/gcc/config/mips/mti-linux.h @@ -17,10 +17,14 @@ You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ -/* This target is a multilib target, specify the sysroot paths. */ -#define MIPS_SYSVERSION_SPEC \ -"%{mips32:r1}%{mips64:r1}%{mips32r2:r2}%{mips64r2:r2}" \ -"%{mips32r6:r6}%{mips64r6:r6}%{mips16:-mips16}" +/* This target is a multilib target, specify the sysroot paths. + MIPS_SYSVERSION_SPEC defaults to 'r2' (mips32r2 or mips64r2) unless + 'r1' or 'r6' are specifically given so that mips32r3, mips32r5, + mips64r3, and mips64r5 will all default to 'r2'. See MULTILIB_MATCHES + definition in t-mti-linux. */ + +#define MIPS_SYSVERSION_SPEC \ +"%{mips32|mips64:r1;mips32r6|mips64r6:r6;:r2}%{mips16:-mips16}" #undef SYSROOT_SUFFIX_SPEC #define SYSROOT_SUFFIX_SPEC\
Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)
Marek, I have run into a problem with this warning and building glibc. sysdeps/ieee754/s_matherr.c has: int weak_function __matherr(struct exception *x) { int n=0; if(x->arg1!=x->arg1) return 0; return n; } And arg1 is a floating point type. I think that if the value of x->xarg1 is a NaN then the if statement should return TRUE because a NaN never compares equal to anything, even another NaN (check with your local IEEE expert). I believe this method of checking for a NaN is fairly common and I am not sure if GCC should be emitting a warning for it. Steve Ellcey sell...@imgtec.com
[Patch, MIPS] MIPS specific optimization for o32 ABI
This patch implements a MIPS o32 ABI specific optimization called frame header optimization. In the o32 ABI, routines allocate 16 bytes on the stack before calling another routine. This space is used by the callee as space to write the register arguments to if their address is taken. The n32 and n64 ABI's use the more common approach of copying register arguments to local variables if their address is needed. This optimization allows the callee to use that 16 bytes for other purposes if it does not need it to write its arguments out to memory and if it only needs 16 bytes of stack space (or less) for saving callee-saved registers. This can allow us to avoid having to allocate extra stack space in a routine and to remove the stack pointer increment/decrement instructions from the prolog and epilogue which results in a small performance improvement. This patch has been in the Mentor GCC toolchain for MIPS for a while and gotten some testing there and I tested it on the top-of-tree GCC sources with no regressions. OK to checkin? Steve Ellcey sell...@imgtec.com 2015-07-28 Steve Ellcey Zoran Jovanovic Catherine Moore Tom de Vries * config/mips/mips.opt (mframe-header-opt): New option. * config/mips/mips.c (struct mips_frame_info): Add skip_stack_frame_allocation_p field. (struct machine_function): Add callees_use_frame_header_p, uses_frame_header_p, and initial_total_size fields. (mips_frame_header_usage): New hash. (mips_find_if_frame_header_is_used): New Function. (mips_callee_use_frame_header): New Function. (mips_callees_use_frame_header_p): New Function. (mips_cfun_use_frame_header_p): New Function. (mips_get_updated_offset): New Function. (mips_skip_stack_frame_alloc): New Function. (mips_frame_header_update_insn): New Function. (mips_rest_of_frame_header_opt): New function. (mips_compute_frame_info): Add recalculate and frame arguments. (mips_frame_pointer_required): Add new args to mips_compute_frame_info call. (mips_initial_elimination_offset): Ditto. (mips_gp_expand_needed_p): New function factored out of mips_expand_ghost_gp_insns. (mips_expand_ghost_gp_insns): Use mips_gp_expand_needed_p. (mips_reorg): Use mips_rest_of_frame_header_opt. 2015-07-28 Steve Ellcey Tom de Vries * gcc.target/mips/fho-1.c: New test. * gcc.target/mips/fho-2.c: New test. * gcc.target/mips/mips.exp: Add -mframe-header-opt to mips_option_groups. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index c3cd52d..7cdef89 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -77,6 +77,7 @@ along with GCC; see the file COPYING3. If not see #include "cgraph.h" #include "builtins.h" #include "rtl-iter.h" +#include "dumpfile.h" /* This file should be included last. */ #include "target-def.h" @@ -380,6 +381,9 @@ struct GTY(()) mips_frame_info { /* The offset of hard_frame_pointer_rtx from the bottom of the frame. */ HOST_WIDE_INT hard_frame_pointer_offset; + + /* Skip stack frame allocation if possible. */ + bool skip_stack_frame_allocation_p; }; /* Enumeration for masked vectored (VI) and non-masked (EIC) interrupts. */ @@ -472,6 +476,15 @@ struct GTY(()) machine_function { /* True if this is an interrupt handler that should use DERET instead of ERET. */ bool use_debug_exception_return_p; + + /* True if some of the callees uses its frame header. */ + bool callees_use_frame_header_p; + + /* True if current function uses its frame header. */ + bool uses_frame_header_p; + + /* Frame size before updated by optimizations. */ + HOST_WIDE_INT initial_total_size; }; /* Information about a single argument. */ @@ -574,6 +587,8 @@ struct mips_rtx_cost_data /* Global variables for machine-dependent things. */ +static hash_map *mips_frame_header_usage; + /* The -G setting, or the configuration's default small-data limit if no -G option is given. */ static unsigned int mips_small_data_threshold; @@ -1296,6 +1311,7 @@ static const struct mips_rtx_cost_data } }; +static void mips_rest_of_frame_header_opt (void); static rtx mips_find_pic_call_symbol (rtx_insn *, rtx, bool); static int mips_register_move_cost (machine_mode, reg_class_t, reg_class_t); @@ -10358,6 +10374,114 @@ mips_save_reg_p (unsigned int regno) return false; } +/* Try to find if function may use its incoming frame header. */ + +static bool +mips_find_if_frame_header_is_used (tree fndecl) +{ + bool *frame_header_unused; + + if (mips_frame_header_usage) +frame_header_unused = mips_frame_header_usage->get (fndecl); + else +frame_header_unused = false; + + return !frame_header_unused; +} + +/* Return true i
Re: [Patch, MIPS] MIPS specific optimization for o32 ABI
On Fri, 2015-07-31 at 00:32 +, Joseph Myers wrote: > New command-line options need documenting in invoke.texi. Good point, thanks for catching that. Here is an updated patch with invoke.texi. There are no other changes to the patch. Steve Ellcey sell...@imgtec.com 2015-07-31 Steve Ellcey Zoran Jovanovic Catherine Moore Tom de Vries * config/mips/mips.opt (mframe-header-opt): New option. * config/mips/mips.c (struct mips_frame_info): Add skip_stack_frame_allocation_p field. (struct machine_function): Add callees_use_frame_header_p, uses_frame_header_p, and initial_total_size fields. (mips_frame_header_usage): New hash. (mips_find_if_frame_header_is_used): New Function. (mips_callee_use_frame_header): New Function. (mips_callees_use_frame_header_p): New Function. (mips_cfun_use_frame_header_p): New Function. (mips_get_updated_offset): New Function. (mips_skip_stack_frame_alloc): New Function. (mips_frame_header_update_insn): New Function. (mips_rest_of_frame_header_opt): New function. (mips_compute_frame_info): Add recalculate and frame arguments. (mips_frame_pointer_required): Add new args to mips_compute_frame_info call. (mips_initial_elimination_offset): Ditto. (mips_gp_expand_needed_p): New function factored out of mips_expand_ghost_gp_insns. (mips_expand_ghost_gp_insns): Use mips_gp_expand_needed_p. (mips_reorg): Use mips_rest_of_frame_header_opt. * doc/invoke.texi (MIPS Options): Document -mframe-header-opt flag. 2015-07-31 Steve Ellcey Tom de Vries * gcc.target/mips/fho-1.c: New test. * gcc.target/mips/fho-2.c: New test. * gcc.target/mips/mips.exp: Add -mframe-header-opt to mips_option_groups. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index c3cd52d..7cdef89 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -77,6 +77,7 @@ along with GCC; see the file COPYING3. If not see #include "cgraph.h" #include "builtins.h" #include "rtl-iter.h" +#include "dumpfile.h" /* This file should be included last. */ #include "target-def.h" @@ -380,6 +381,9 @@ struct GTY(()) mips_frame_info { /* The offset of hard_frame_pointer_rtx from the bottom of the frame. */ HOST_WIDE_INT hard_frame_pointer_offset; + + /* Skip stack frame allocation if possible. */ + bool skip_stack_frame_allocation_p; }; /* Enumeration for masked vectored (VI) and non-masked (EIC) interrupts. */ @@ -472,6 +476,15 @@ struct GTY(()) machine_function { /* True if this is an interrupt handler that should use DERET instead of ERET. */ bool use_debug_exception_return_p; + + /* True if some of the callees uses its frame header. */ + bool callees_use_frame_header_p; + + /* True if current function uses its frame header. */ + bool uses_frame_header_p; + + /* Frame size before updated by optimizations. */ + HOST_WIDE_INT initial_total_size; }; /* Information about a single argument. */ @@ -574,6 +587,8 @@ struct mips_rtx_cost_data /* Global variables for machine-dependent things. */ +static hash_map *mips_frame_header_usage; + /* The -G setting, or the configuration's default small-data limit if no -G option is given. */ static unsigned int mips_small_data_threshold; @@ -1296,6 +1311,7 @@ static const struct mips_rtx_cost_data } }; +static void mips_rest_of_frame_header_opt (void); static rtx mips_find_pic_call_symbol (rtx_insn *, rtx, bool); static int mips_register_move_cost (machine_mode, reg_class_t, reg_class_t); @@ -10358,6 +10374,114 @@ mips_save_reg_p (unsigned int regno) return false; } +/* Try to find if function may use its incoming frame header. */ + +static bool +mips_find_if_frame_header_is_used (tree fndecl) +{ + bool *frame_header_unused; + + if (mips_frame_header_usage) +frame_header_unused = mips_frame_header_usage->get (fndecl); + else +frame_header_unused = false; + + return !frame_header_unused; +} + +/* Return true if the instruction is a call and the called function may use its + incoming frame header. */ + +static bool +mips_callee_use_frame_header (rtx_insn *insn) +{ + rtx call_insn; + tree fndecl; + + if (insn == NULL_RTX || !USEFUL_INSN_P (insn)) +return false; + + /* Handle sequence of instructions. */ + if (GET_CODE (PATTERN (insn)) == SEQUENCE) +{ + rtx_insn *subinsn; + FOR_EACH_SUBINSN (subinsn, insn) + if (INSN_P (subinsn) && mips_callee_use_frame_header (subinsn)) + return true; +} + + if (GET_CODE (insn) != CALL_INSN) +return false; + + if (GET_CODE (PATTERN (insn)) != PARALLEL + || GET_CODE (XVECEXP (PATTERN (insn), 0, 0)) != SET) +return true; + + call_insn = SET_SRC (X
Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables
On Mon, 2015-09-14 at 09:50 +0200, Bernd Schmidt wrote: > On 09/13/2015 08:24 PM, Mark Wielaard wrote: > > commit 97505bd0e4ac15d86c2a302cfebc5f1a4fc2c2e8 > > Author: Mark Wielaard > > Date: Fri Sep 11 23:54:15 2015 +0200 > > > > PR28901 -Wunused-variable ignores unused const initialised variables > > in C > > This is ok. > > > Bernd I am not sure I like this change. It broke the GLIBC build for me on MIPS. Basically GLIBC has a header file with initialized static constant globals (sysdeps/ieee754/dbl-64/atnat2.h contains tqpi1 and qpi1) and that header file is included in multiple .c files like sysdeps/ieee754/dbl-64/e_atan2.c that use some, but not all, of those static constant variables. But between the various .c files all of the globals are used somewhere, just not in every individual .c file. This seems like a perfectly reasonable use of static globals and header files that should not be identified as a warning. This warning causes the GLIBC build to fail because GLIBC is compiled with -Wall -Werror. Steve Ellcey sell...@imgtec.com % cat a.h static const int a = 3; static const int b = 5; static const int c = 4; % cat a.c #include "a.h" int foo() { return a + b;} % gcc -O2 -Werror -Wall -c a.c In file included from a.c:1:0: a.h:3:18: error: 'c' defined but not used [-Werror=unused-const-variable] static const int c = 4; ^ cc1: all warnings being treated as errors
Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables
On Tue, 2015-09-15 at 19:10 +0200, Jakub Jelinek wrote: > On Tue, Sep 15, 2015 at 10:02:15AM -0700, Steve Ellcey wrote: > > I am not sure I like this change. It broke the GLIBC build for me on > > MIPS. Basically GLIBC has a header file with initialized static > > constant globals (sysdeps/ieee754/dbl-64/atnat2.h contains tqpi1 and > > qpi1) and that header file is included in multiple .c files like > > Multiple? All I can see is e_atan2.c including that header file, nothing > else. Whoops, bad assumption on my part. I thought it must be included somewhere else, otherwise why put it in a header. > > sysdeps/ieee754/dbl-64/e_atan2.c that use some, but not all, of those > > static constant variables. But between the various .c files all of the > > globals are used somewhere, just not in every individual .c file. This > > seems like a perfectly reasonable use of static globals and header files > > that should not be identified as a warning. > > I disagree. While const vars are special in C++, it is really like > any other variable in C, so the warning is IMHO appropriate. > > Jakub I guess it is not the 'const' I think should be handled special but the 'static'. Having unused static variables (const or not) declared in a header file but unused seems reasonable since the header file may be included in multiple .c files each of which uses a subset of the static variables. Steve Ellcey sell...@imgtec.com
Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables
On Fri, 2015-09-18 at 20:29 -0600, Martin Sebor wrote: > On 09/15/2015 11:20 AM, Steve Ellcey wrote: > > I guess it is not the 'const' I think should be handled special but the > > 'static'. Having unused static variables (const or not) declared in a > > header file but unused seems reasonable since the header file may be > > included in multiple .c files each of which uses a subset of the static > > variables. > > I tend to agree. I suppose diagnosing unused non-const static > definitions might be helpful but I can't think of a good reason > to diagnose unused initialized static consts in C. Especially > since they're not diagnosed in C++. > > Would diagnosing them in source files while avoiding the warning > for static const definitions in headers be an acceptable compromise? > > Martin That seems like a reasonable compromise to me. Steve Ellcey sell...@imgtec.com
Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables
On Thu, 2015-09-24 at 13:56 +0200, Bernd Schmidt wrote: > I think at this point we have reports of just two packages generating > extra warnings, with the warnings at least justifiable in both cases. So > my vote would be to leave things as-is for now and see if more reports > come in. It is after all expected that a new warning option generates > new warnings. > > > Bernd At least one of the warnings in glibc is not justified (in my opinion). The header file timezone/private.h defines time_t_min and time_t_max. These are not used in any of the timezone files built by glibc but if you look at the complete tz package they are used when building other objects that are not part of the glibc tz component and that include private.h. I would make two arguments about why I don't think we should warn. One is that 'static int const foo = 1' seems a lot like '#define foo 1' and we don't complain about the macro foo not being used. If we complain about the unused const, why not complain about the unused macro? We don't complain because we know it would result in too many warnings in existing code. If we want people to move away from macros, and I think we do, then we should not make it harder to do so by introducing new warnings when they change. The other is that C++ does not complain about this. I know that C and C++ are different languages with different rules but it seems like this difference is a difference that doesn't have to exist. Either both should complain or neither should complain. I can't think of any valid reason for one to complain and the other not to. I think using the used attribute is probably a reasonable way to address this issue if we continue to generate the warning but I still feel it is a bad warning in that it will (sometimes) warn about a coding style that seems perfectly reasonable to me. Steve Ellcey sell...@imgtec.com
RFC: Patch to allow spill slot alignment greater than the stack alignment
I have spent some time trying to do dynamic stack alignment on MIPS and had considerable trouble. The problems are mainly due to the dwarf based stack unwinding and setjmp/longjmp usages where the code does not go through the normal function prologue and epilogue code. Since the only need I have for a dynamically aligned stack is to ensure that MSA registers (128 bits long, wanting 128 bit alignment) get spilled to an aligned address on an ABI where the stack is only guaranteed to be 64 bit aligned, I decided to try an approach based on how local variables are aligned beyond the supported stack alignment. I allocate the space with padding and then create an aligned pointer into that space. Since I want to do this in lra_spill and I can't allocate a new register to use at that point what I do is create an 'aligned spill base register' based on the frame_register in expand_prologue, and then in assign_mem_slot, if I want an alignment greater than what the stack supports, I allocate extra space and change the MEM reference for the spill from 'frame_register + offset' to 'aligned_spill_base_register + offset'. The main advantage to this approach over dynamically aligning the stack is that by not changing the real stack (or frame) pointer there is minimal chance of breaking the ABI and there are no changes needed to the dwarf unwind code. The main disadvantage is that I am padding each individual spill so I am wasting more space than absolutely required. It should be possible to address this by putting all the aligned spills together and sharing the padding but I would like to leave that for a future improvement. In the mean time I would like to get some comments on this approach and see what people think. Does this seem like a reasonable approach to allowing for aligned spills beyond what the stack supports? I have tested this on MIPS, forcing 64 bit registers to be aligned, even though the stack supports this natively as a way to test the code before MSA is checked in and it seems to be working with no regressions. I do get scan failures in the gcc.target/mips section because I am forcing GCC to setup $16 as the 'aligned spill base register' in all routines as a stress test but I do not get any new execution failures. Steve Ellcey sell...@imgtec.com 2015-10-02 Steve Ellcey * config/mips/mips.c (mips_attribute_table): Add align_spills and no_align_spills attribute entries. (mips_cfun_has_msa_p): New function. (setup_align_basereg_p): Ditto. (mips_align_spill_p): Ditto.. (mips_align_spill_base_regnum): Ditto.. (mips_align_spill_maximum): Ditto.. (mips_expand_prologue): Setup aligned spill register when needed. (mips_conditional_register_usage): Reserve aligned spill register when needed. (TARGET_ALIGN_SPILL_P): Set to mips_align_spill_p. (TARGET_ALIGN_SPILL_BASE_REGNUM): Set to mips_align_spill_base_regnum. (TARGET_ALIGN_SPILL_MAXIMUM): Set to mips_align_spill_maximum. * config/mips/mips.opt (malign-spills): New option. * df-scan.c (df_get_regular_block_artificial_uses): Check for usage of aligned spill register. (df_get_eh_block_artificial_uses): Ditto. * hooks.c (hook_uint_void_invalid_regnum): New function. * hooks.h (hook_uint_void_invalid_regnum): Ditto. * lra-spills.c (align_slot_address): New function. (assign_mem_slot): Align slot address if necessary. * target.def (align_spill_p): New hook. (align_spill_base_regnum): New hook. (align_spill_maximum): New hook. * tm.texi.in (TARGET_ALIGN_SPILL_P): New hook. (TARGET_ALIGN_SPILL_BASE_REGNUM): Ditto. (TARGET_ALIGN_SPILL_MAXIMUM): Ditto. * tm.texi: Regenerate. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 0e0ecf2..84c3724 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -773,6 +773,8 @@ static const struct attribute_spec mips_attribute_table[] = { mips_handle_use_shadow_register_set_attr, false }, { "keep_interrupts_masked", 0, 0, false, true, true, NULL, false }, { "use_debug_exception_return", 0, 0, false, true, true, NULL, false }, + { "align_spills", 0, 0, true, false, false, NULL, false }, + { "no_align_spills", 0, 0, true, false, false, NULL, false }, { NULL, 0, 0, false, false, false, NULL, false } }; @@ -1607,6 +1609,49 @@ mips_merge_decl_attributes (tree olddecl, tree newdecl) DECL_ATTRIBUTES (newdecl)); } +/* Determine if a function may use MSA registers and thus need + aligned spills. */ + +static bool +mips_cfun_has_msa_p (void) +{ + /* For now, for testing, assume all functions use MSA + (and thus may need aligned spills ). */ +#if 0 + if (!cfun || !TARGET_MSA) +return FALSE; + + for (insn = get_insns ();
Re: RFC: Patch to allow spill slot alignment greater than the stack alignment
On Mon, 2015-10-05 at 10:41 +0200, Bernd Schmidt wrote: > On 10/02/2015 10:57 PM, Steve Ellcey wrote: > > I have spent some time trying to do dynamic stack alignment on MIPS and had > > considerable trouble. The problems are mainly due to the dwarf based stack > > unwinding and setjmp/longjmp usages where the code does not go through the > > normal function prologue and epilogue code. > [...] > > The main advantage to this approach over dynamically aligning the stack > > is that by not changing the real stack (or frame) pointer there is > > minimal chance of breaking the ABI and there are no changes needed to > > the dwarf unwind code. The main disadvantage is that I am padding each > > individual spill so I am wasting more space than absolutely required. > > It should be possible to address this by putting all the aligned spills > > together and sharing the padding but I would like to leave that for a > > future improvement. > > > > In the mean time I would like to get some comments on this approach and > > see what people think. Does this seem like a reasonable approach to > > allowing for aligned spills beyond what the stack supports? > > Personally I'm not a fan. Your description of it makes it sound > immensely wasteful, and I'm really not clear on why stack alignment > wouldn't work for MIPS when it's been shown to work elsewhere. I think > we'd want to see a clear demonstration of unfixable problems with stack > alignment before allowing something like this in. > > Vlad would have to comment on the LRA bits, probably. > > > Bernd There probably is some way to get dynamic stack alignment to work on MIPS, but I am not sure I can do it. The only platform that I see that uses dynamic stack alignment is x86. I think the difficulties in getting this to work correctly is why no other platform has implemented it. The most common response I have gotten when asking around for help on dynamic stack alignment is usually "just break the ABI". My approach does waste some space, on MIPS I would be allocating 32 bytes of stack space to spill a 16 byte MSA register, but the hope/belief is that MSA registers would not get spilled very often. Steve Ellcey
Re: RFC: Patch to allow spill slot alignment greater than the stack alignment
On Mon, 2015-10-05 at 09:21 -0700, H.J. Lu wrote: > On Mon, Oct 5, 2015 at 9:10 AM, Steve Ellcey wrote: > > There probably is some way to get dynamic stack alignment to work on > > MIPS, but I am not sure I can do it. The only platform that I see that > > uses dynamic stack alignment is x86. I think the difficulties in > > getting this to work correctly is why no other platform has implemented > > it. The most common response I have gotten when asking around for help > > on dynamic stack alignment is usually "just break the ABI". > > > > My approach does waste some space, on MIPS I would be allocating 32 > > bytes of stack space to spill a 16 byte MSA register, but the > > hope/belief is that MSA registers would not get spilled very often. > > We keep track stack frame information precisely in x86 backend, > including unwind info, and we fixed DWARF unwind info generation to > support it. We used gcc_assert to verify that everything is in > sync. > > I don't know what is missing to support MIPS dynamic stack alignment. > Unless DWARF unwind info isn't sufficient for MIPS, I don't see why > dynamic stack alignment can't be done for MIPS. If you get the wrong > DWARF unwind info, you can add assert to GCC source to track down > its origin and fix assert to generate the correct DWARF unwind info. The problem is that I don't know what is missing either. I don't know what the 'correct' stack frame information should look like so I don't really know what I am trying to generate. readelf cannot decode the unwind section of MIPS objects so I can't look at things that way and I have been trying to work based on what .cfi directives I think I should be generating and that has not been going well. One example of an issue I have run into is with the DWARF unwind generation and 'Rule 16' in dwarf2cfi.c. It assumes the AND instruction has an integer constant argument but MIPS can't do an AND with a constant like -16 so it has to put it in a register first and do the AND with the register. I just hacked that code as a temporary workaround but its the sort of x86 assumption that I have run into due to the fact that no platform other than x86 currently does dynamic stack alignment. Steve Ellcey sell...@imgtec.com
RE: [PATCH, MIPS] Frame header optimization for MIPS O32 ABI
On Mon, 2015-09-28 at 22:10 +, Moore, Catherine wrote: > Hi Steve, I'm sorry for the delay in reviewing this patch. > Some changes have been committed upstream (see revision #227941) that will > require updates to this patch. > Please post the update for review. Other comments are embedded. OK, I have updated the comments based on your input and changed the code to compile with the ToT GCC after revision @227941. Here is the new patch. 2015-10-05 Steve Ellcey * config.gcc (mips*-*-*): Add frame-header-opt.o to extra_objs. * frame-header-opt.c: New file. * config/mips/mips-proto.h (mips_register_frame_header_opt): Add prototype. * config/mips/mips.c (mips_compute_frame_info): Check optimize_call_stack flag. (mips_option_override): Register new frame_header_opt pass. (mips_frame_info, mips_int_mask, mips_shadow_set, machine_function): Move these types to... * config/mips/mips.h: here. (machine_function): Add does_not_use_frame_header and optimize_call_stack fields. * config/mips/t-mips (frame-header-opt.o): Add new make rule. * doc/invoke.texi (-mframe-header-opt, -mno-frame-header-opt): Document new flags. * config/mips/mips.opt (mframe-header-opt): Add new option. diff --git a/gcc/config.gcc b/gcc/config.gcc index 56797bd..7bf66f8 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -420,6 +420,7 @@ microblaze*-*-*) mips*-*-*) cpu_type=mips extra_headers="loongson.h" + extra_objs="frame-header-opt.o" extra_options="${extra_options} g.opt fused-madd.opt mips/mips-tables.opt" ;; nds32*) diff --git a/gcc/config/mips/frame-header-opt.c b/gcc/config/mips/frame-header-opt.c new file mode 100644 index 000..7c7b1f2 --- /dev/null +++ b/gcc/config/mips/frame-header-opt.c @@ -0,0 +1,216 @@ +/* Analyze functions to determine if callers need to allocate a frame header + on the stack. The frame header is used by callees to save their arguments. + This optimization is specific to TARGET_OLDABI targets. For TARGET_NEWABI + targets, if a frame header is required, it is allocated by the callee. + + + Copyright (C) 2015 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the +Free Software Foundation; either version 3, or (at your option) any +later version. + +GCC is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + + +#include "config.h" +#include "system.h" +#include "context.h" +#include "coretypes.h" +#include "tree.h" +#include "tree-core.h" +#include "tree-pass.h" +#include "target.h" +#include "target-globals.h" +#include "cfg.h" +#include "cgraph.h" +#include "function.h" +#include "basic-block.h" +#include "gimple.h" +#include "gimple-iterator.h" +#include "gimple-walk.h" + +static unsigned int frame_header_opt (void); + +namespace { + +const pass_data pass_data_ipa_frame_header_opt = +{ + IPA_PASS, /* type */ + "frame-header-opt", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_CGRAPHOPT, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_ipa_frame_header_opt : public ipa_opt_pass_d +{ +public: + pass_ipa_frame_header_opt (gcc::context *ctxt) +: ipa_opt_pass_d (pass_data_ipa_frame_header_opt, ctxt, + NULL, /* generate_summary */ + NULL, /* write_summary */ + NULL, /* read_summary */ + NULL, /* write_optimization_summary */ + NULL, /* read_optimization_summary */ + NULL, /* stmt_fixup */ + 0, /* function_transform_todo_flags_start */ + NULL, /* function_transform */ + NULL) /* variable_transform */ + {} + + /* opt_pass methods: */ + virtual bool gate (function *) +{ + /* This optimization has no affect if TARGET_NEWABI. If optimize + is not at least 1 then the data needed for the optimization is + not available and nothing will be done anyway. */ + return TARGET_OLDABI && flag_frame_header_optimization; +} + + virtual unsigned int execute (funct
Re: RFC: Patch to allow spill slot alignment greater than the stack alignment
On Mon, 2015-10-05 at 09:57 -0700, H.J. Lu wrote: > You need to update dwarf2cfi.c to generate proper unwind info for > whatever frame instructions MIPS generates, like what we did for > x86 dynamic stack realignment. The problem is understanding what the 'proper' unwind info is and figuring out what is wrong about it when it doesn't work. I used Bernd's comment about Rule #6 to handle the sp = sp AND reg issue, but I have a couple of more dwarf/cfi questions. One, can you explicitly make a copy of the stack pointer to another register and not make that register the new stack pointer? I want to save the old stack pointer before aligning it but when I do that I think that dwarfcfi.c automatically assumes that the new register is now the stack pointer and that is not what I want. I want the stack pointer to remain as the original register. My other question is about 'set_unexpected' and how that affects the generated unwind info. I noticed that a lot of my failing tests use 'set_unexpected' and I don't know what this function does or how it affects the generated code that would cause these tests in particular to fail. Steve Ellcey sell...@imgtec.com
Re: RFC: Patch to allow spill slot alignment greater than the stack alignment
On Tue, 2015-10-06 at 17:39 +0200, Bernd Schmidt wrote: > > Did your tag that copy as RTX_FRAME_RELATED? I'd expect dwarf2cfi would > ignore instructions with that bit unset. There's even a comment > indicating arm does something like this: Yes, I had marked it as RTX_FRAME_RELATED. When I took that out things looked better (well, maybe just different). If I remove that and I change Rule #16 to handle an AND with a register I get odd looking .cfi stuff. The AND instruction (which is marked with RTX_FRAME_RELATED) seems to generate these cfi_escape macros: .cfi_escape 0x10,0x1f,0x2,0x8e,0x7c .cfi_escape 0x10,0x1e,0x2,0x8e,0x78 .cfi_escape 0x10,0xc,0x2,0x8e,0x74 which are meaningless to me. What I found works better is to skip the dwarf2cfi.c changes and explicitly attach this note to the AND instruction: cfi_note = alloc_reg_note (REG_CFA_DEF_CFA, stack_pointer_rtx, NULL_RTX); REG_NOTES (insn) = cfi_note; When I do this the only unexpected test suite execution failures I see are ones that call the C++ set_unexpected function. FYI: The reason I was copying the stack pointer to another register was to use that other register as my argument pointer (needed after the stack pointer got aligned) and to use it to restore the stack pointer at the end of the function for normal returns. Steve Ellcey sell...@imgtec.com
Re: RFC: Patch to allow spill slot alignment greater than the stack alignment
On Tue, 2015-10-06 at 11:10 -0700, H.J. Lu wrote: > Does it pass all tests under g++.dg/torture/stackalign? You need > to implement -mstackrealign and -mpreferred-stack-boundary= > as well as update check_effective_target_automatic_stack_alignment > to run all stack alignment tests. No, those tests were not run. I have not implemented the -mstackrealign or -mpreferred-stack-boundary options. I do not think those are the options I want for stack realignment on MIPS. The only reason for implementing stack realignment on MIPS is to align MSA (vector) register spills. Yes, stack realignment can also be used with aligned local variables but those could also be handled with the existing alloca method without implementing stack realignment. So I want an option (turned on by default when using -mmsa) that only aligns a function if that function contains MSA register usage. I don't want an option to align every function because that would slow things down for no reason. I should and will try these test cases with my option. Right now my testing forces all functions to be aligned even if they don't use MSA because that gives me maximum testing, but the final code will only align functions that use MSA. Steve Ellcey sell...@imgtec.com
RE: [PATCH, MIPS] Frame header optimization for MIPS O32 ABI
On Tue, 2015-10-06 at 12:02 +, Moore, Catherine wrote: > > Moore, Catherine writes: > > > The patch itself looks good, but the tests that you added need a little > > > more > > work. Here is a new patch for just the tests. I added NOMIPS16 and the -mabi=32 flag as Matthew suggested and I also added -mno-abicalls. Without the -mno-abicalls the MIPS linux compiler defaulted to -mabicalls and allocated 32 bytes without the optimization and the MIPS elf compiler defaulted to -mno-abicalls and allocated 24 bytes without the optimization. This synchronizes them so they both allocate 24 bytes without the optimization and 8 bytes with it. Everything should pass now, it passed for me using mips-mti-linux-gnu and mips-mti-elf. Steve Ellcey sell...@imgtec.com 2015-10-06 Steve Ellcey * gcc.target/mips/mips.exp (mips_option_groups): Add -mframe-header-opt and -mno-frame-header-opt options. * gcc.target/mips/frame-header-1.c: New file. * gcc.target/mips/frame-header-2.c: New file. * gcc.target/mips/frame-header-3.c: New file. diff --git a/gcc/testsuite/gcc.target/mips/frame-header-1.c b/gcc/testsuite/gcc.target/mips/frame-header-1.c new file mode 100644 index 000..971656d --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/frame-header-1.c @@ -0,0 +1,21 @@ +/* Verify that we do not optimize away the frame header in foo when using + -mno-frame-header-opt by checking the stack pointer increment done in + that function. Without the optimization foo should increment the stack + by 24 bytes, with the optimization it would only be 8 bytes. */ + +/* { dg-do compile } */ +/* { dg-options "-mno-frame-header-opt -mabi=32 -mno-abicalls" } */ +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */ +/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-24" } } */ + +NOMIPS16 void __attribute__((noinline)) +bar (int* a) +{ + *a = 1; +} + +NOMIPS16 void +foo (int a) +{ + bar (&a); +} diff --git a/gcc/testsuite/gcc.target/mips/frame-header-2.c b/gcc/testsuite/gcc.target/mips/frame-header-2.c new file mode 100644 index 000..0e86bc9 --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/frame-header-2.c @@ -0,0 +1,21 @@ +/* Verify that we do optimize away the frame header in foo when using + -mframe-header-opt by checking the stack pointer increment done in + that function. Without the optimization foo should increment the + stack by 24 bytes, with the optimization it would only be 8 bytes. */ + +/* { dg-do compile } */ +/* { dg-options "-mframe-header-opt -mabi=32 -mno-abicalls" } */ +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */ +/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-8" } } */ + +NOMIPS16 void __attribute__((noinline)) +bar (int* a) +{ + *a = 1; +} + +NOMIPS16 void +foo (int a) +{ + bar (&a); +} diff --git a/gcc/testsuite/gcc.target/mips/frame-header-3.c b/gcc/testsuite/gcc.target/mips/frame-header-3.c new file mode 100644 index 000..2a8c515 --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/frame-header-3.c @@ -0,0 +1,22 @@ +/* Verify that we do not optimize away the frame header in foo when using + -mframe-header-opt but are calling a weak function that may be overridden + by a different function that does need the frame header. Without the + optimization foo should increment the stack by 24 bytes, with the + optimization it would only be 8 bytes. */ + +/* { dg-do compile } */ +/* { dg-options "-mframe-header-opt -mabi=32 -mno-abicalls" } */ +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */ +/* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-24" } } */ + +NOMIPS16 void __attribute__((noinline, weak)) +bar (int* a) +{ + *a = 1; +} + +void +NOMIPS16 foo (int a) +{ + bar (&a); +} diff --git a/gcc/testsuite/gcc.target/mips/mips.exp b/gcc/testsuite/gcc.target/mips/mips.exp index 42e7fff..0f2d6a2 100644 --- a/gcc/testsuite/gcc.target/mips/mips.exp +++ b/gcc/testsuite/gcc.target/mips/mips.exp @@ -256,6 +256,7 @@ set mips_option_groups { maddps "HAS_MADDPS" lsa "(|!)HAS_LSA" section_start "-Wl,--section-start=.*" +frame-header "-mframe-header-opt|-mno-frame-header-opt" } for { set option 0 } { $option < 32 } { incr option } {
Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables
On Thu, 2015-09-24 at 21:24 -0400, Trevor Saunders wrote: > On Thu, Sep 24, 2015 at 06:55:11PM +0200, Bernd Schmidt wrote: > > On 09/24/2015 06:11 PM, Steve Ellcey wrote: > > >At least one of the warnings in glibc is not justified (in my opinion). > > >The header file timezone/private.h defines time_t_min and time_t_max. > > >These are not used in any of the timezone files built by glibc but if > > >you look at the complete tz package they are used when building other > > >objects that are not part of the glibc tz component and that include > > >private.h. > > > > The standard C way of writing this would be to declare time_t_min in the > > header and have its definition in another file, or use a TIME_T_MIN macro as > > glibc does in mktime.c. That file even has a local redefinition: > > time_t time_t_min = TIME_T_MIN; > > So at the very least the warning points at code that has some oddities. > > I can believe its an odd way to write C, but is it actually a bad one? > I expect if I got warnings for code like that I'd be pretty unhappy > about either moving the constant out where the compiler can't always see > it, or making it a macro. > > > >I would make two arguments about why I don't think we should warn. > > > > > >One is that 'static int const foo = 1' seems a lot like '#define foo 1' > > >and we don't complain about the macro foo not being used. If we > > >complain about the unused const, why not complain about the unused > > >macro? We don't complain because we know it would result in too many > > >warnings in existing code. If we want people to move away from macros, > > >and I think we do, then we should not make it harder to do so by > > >introducing new warnings when they change. > > > > > >The other is that C++ does not complain about this. I know that C and > > >C++ are different languages with different rules but it seems like this > > >difference is a difference that doesn't have to exist. Either both > > >should complain or neither should complain. I can't think of any valid > > >reason for one to complain and the other not to. > > > > Well, they _are_ different languages, and handling of const is one place > > where they differ. For example, C++ consts can be used in places where > > constant expressions are required. The following is a valid C++ program but > > not a C program: > > > > const int v = 200; > > int t[v]; > > > > The result is that the typical programming style for C is to have constants > > #defined, while for C++ you can find more examples like the above; I recall > > Stroustrup explicitly advocating that in the introductory books I read 20 > > years ago, and using it as a selling point for C++. Existing practice is > > important when deciding what to warn about, and for the moment I remain > > convinced that C practice is sufficiently different from C++. > > existing practice is certainly important, but I would say that what is > good practice is also very important. It seems to me that warning for > these constants is basically making it hard to follow a better practice > than the existing one. That seems pretty unfortunate. > > On the other hand I've become much more of a C++ programmer than a C one > so, I'm probably not the best judge. > > Trev > > > > > > > Bernd So, is there any consensus on this issue? I cannot build top-of-tree glibc with top-of-tree GCC due to this warning coming from timezone/private.h. If GCC is going to keep this warning then I should talk to the owners of the header file (tz group, not glibc) and see if they would be willing to add a unused attribute to it. Personally, I would rather have GCC not issue the warning, but that is just based on my opinion that the definition of a (possibly) unused static initialized variable in a C header is a reasonable practice. Steve Ellcey sell...@imgtec.com
Re: RFC: Patch to allow spill slot alignment greater than the stack alignment
On Tue, 2015-10-06 at 11:10 -0700, H.J. Lu wrote: > Does it pass all tests under g++.dg/torture/stackalign? You need > to implement -mstackrealign and -mpreferred-stack-boundary= > as well as update check_effective_target_automatic_stack_alignment > to run all stack alignment tests. FYI: I was able to run those tests with my stack alignment flag and without any failures. I notice that none of them use set_unexpected, which is where I am having problems on MIPS. Have you tried running any of the following tests that use set_unexpected with your stack alignment flags? They fail or me on MIPS and I was wondering if they work for you on x86. g++.dg/cpp0x/lambda/lambda-eh2.C g++.dg/eh/unexpected1.C g++.old-deja/g++.eh/spec2.C g++.old-deja/g++.eh/spec3.C g++.old-deja/g++.mike/eh33.C g++.old-deja/g++.mike/eh50.C g++.old-deja/g++.mike/eh51.C Steve Ellcey sell...@imgtec.com
Re: RFC: Patch to allow spill slot alignment greater than the stack alignment
On Fri, 2015-10-09 at 15:36 -0700, H.J. Lu wrote: > > I am not sure what you were asking. I tried: > > make check-g++ RUNTESTFLAGS="--target_board='unix{-m32\ > -mstackrealign}' old-deja.exp=spec*.C" > ... > > === g++ Summary === > > # of expected passes 495 > # of expected failures 3 > > make check-g++ RUNTESTFLAGS="--target_board='unix{-m32\ > -mstackrealign}' old-deja.exp=eh*.C" > ... > === g++ Summary === > > # of expected passes 372 OK, that was what I was wondering about. I wasn't sure if you had run the entire test suite with -mstackrealign or only the tests in g++.dg/torture/stackalign. Steve Ellcey sell...@imgtec.com
[Patch, MIPS] Frame header optimization for MIPS (part 2)
Here is the second part of the MIPS frame header optimization patch. The first part avoided allocating a frame header if it knew that none of the functions that it called would need it. This part uses a frame header to save callee saved registers if doing so will allow the function to avoid having to allocate stack space (thus avoiding the need to increment and decrement the stack pointer at all). I did a little reorganization of my first patch as part of this, seperating the is_leaf_function check from needs_frame_header_p and making it a seperate check from callees_functions_use_frame_header. This allowed me to reuse the needs_frame_header_p function and the does_not_use_frame_header field that stores its value in this new patch. The optimimization may be more conservative than it has to be in some cases like functions with floating point arguments but it catches the most important cases and we can extend it later if needed. Tested with no regressions using the mips-mti-linux-gnu toolchain. OK to checkin? Steve Ellcey sell...@imgtec.com 2015-10-16 Steve Ellcey * frame-header-opt.c (gate): Check for optimize > 0. (has_inlined_assembly): New function. (needs_frame_header_p): Remove is_leaf_function check, add argument type check. (callees_functions_use_frame_header): Add is_leaf_function and has_inlined_assembly calls.. (set_callers_may_not_allocate_frame): New function. (frame_header_opt): Add is_leaf_function call, add set_callers_may_not_allocate_frame call. * config/mips/mips.c (mips_compute_frame_info): Add check to see if callee saved regs can be put in frame header. (mips_expand_prologue): Add check to see if step1 is zero, fix cfa restores when using frame header to store regs. (mips_can_use_return_insn): Check to see if registers are stored in frame header. * config/mips/mips.h (machine_function): Add callers_may_not_allocate_frame and use_frame_header_for_callee_saved_regs fields. diff --git a/gcc/config/mips/frame-header-opt.c b/gcc/config/mips/frame-header-opt.c index 7c7b1f2..5512838 100644 --- a/gcc/config/mips/frame-header-opt.c +++ b/gcc/config/mips/frame-header-opt.c @@ -79,7 +79,7 @@ public: /* This optimization has no affect if TARGET_NEWABI. If optimize is not at least 1 then the data needed for the optimization is not available and nothing will be done anyway. */ - return TARGET_OLDABI && flag_frame_header_optimization; + return TARGET_OLDABI && flag_frame_header_optimization && (optimize > 0); } virtual unsigned int execute (function *) { return frame_header_opt (); } @@ -125,6 +125,29 @@ is_leaf_function (function *fn) return true; } +/* Return true if this function has inline assembly code or if we cannot + be certain that it does not. False if know that there is no inline + assembly. */ + +static bool +has_inlined_assembly (function *fn) +{ + basic_block bb; + gimple_stmt_iterator gsi; + + /* If we do not have a cfg for this function be conservative and assume + it is may have inline assembly. */ + if (fn->cfg == NULL) +return true; + + FOR_EACH_BB_FN (bb, fn) +for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + if (gimple_code (gsi_stmt (gsi)) == GIMPLE_ASM) + return true; + + return false; +} + /* Return true if this function will use the stack space allocated by its caller or if we cannot determine for certain that it does not. */ @@ -136,20 +159,26 @@ needs_frame_header_p (function *fn) if (fn->decl == NULL) return true; - if (fn->stdarg || !is_leaf_function (fn)) + if (fn->stdarg) return true; for (t = DECL_ARGUMENTS (fn->decl); t; t = TREE_CHAIN (t)) { if (!use_register_for_decl (t)) - return true; + return true; + + /* Some 64 bit types may get copied to general registers using the frame +header, see mips_output_64bit_xfer. Checking for SImode only may be + overly restrictive but it is gauranteed to be safe. */ + if (DECL_MODE (t) != SImode) + return true; } return false; } -/* Returns TRUE if the argument stack space allocated by function FN is used. - Returns FALSE if the space is needed or if the need for the space cannot +/* Return true if the argument stack space allocated by function FN is used. + Return false if the space is needed or if the need for the space cannot be determined. */ static bool @@ -177,6 +206,8 @@ callees_functions_use_frame_header (function *fn) called_fn = DECL_STRUCT_FUNCTION (called_fn_tree); if (called_fn == NULL || DECL_WEAK (called_fn_tree) + || has_inlined_assembly (called_fn) + || !is_leaf_function (called
[Patch, MIPS] Patch to fix MIPS optimization bug in combine
A bug was reported against the GCC MIPS64 compiler that involves a bad combine and this patch fixes the bug. When using '-fexpensive-optimizations -march=mips64r2 -mabi=64' GCC is combining these instructions: (insn 13 12 14 2 (set (reg:DI 206 [ *last_3(D)+-4 ]) (zero_extend:DI (subreg/s/u:SI (reg:DI 196 [ *last_3(D)+-4 ]) 4))) x.c:21 212 {*zero_extendsidi2_dext} (nil)) (insn 15 14 16 2 (set (reg:DI 208) (and:DI (reg:DI 207) (const_int 4294967295 [0x]))) x.c:21 182 {*anddi3} (expr_list:REG_DEAD (reg:DI 207) (nil))) (jump_insn 16 15 17 2 (set (pc) (if_then_else (ne (reg:DI 206 [ *last_3(D)+-4 ]) (reg:DI 208)) (label_ref:DI 35) (pc))) x.c:21 473 {*branch_equalitydi} (expr_list:REG_DEAD (reg:DI 208) (expr_list:REG_DEAD (reg:DI 206 [ *last_3(D)+-4 ]) (int_list:REG_BR_PROB 8010 (nil -> 35) Into: (jump_insn 16 15 17 2 (set (pc) (if_then_else (ne (subreg:SI (reg:DI 207) 4) (subreg:SI (reg:DI 196 [ *last_3(D)+-4 ]) 4)) (label_ref:DI 35) (pc))) x.c:21 472 {*branch_equalitysi} (expr_list:REG_DEAD (reg:DI 207) (int_list:REG_BR_PROB 8010 (nil))) -> 35) The problem is that the jump_insn on MIPS64 generates a beq/bne instruction that compares the entire 64 bit registers and there is no option to only look at the bottom 32 bits. When we got rid of insn 15 we lost the AND that cleared the upper 32 bits of one of the registers and the program fails. My solution was to not allow subregs in the conditional jump instruction. Here is the patch and a test case and I ran the GCC testsuite with no regressions. OK to checkin? Steve Ellcey sell...@imgtec.com 2015-10-21 Steve Ellcey * mips.c (mips_legitimate_combined_insn): New function. (TARGET_LEGITIMATE_COMBINED_INSN): New macro. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 0b4a5fa..4338628 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -19606,6 +19606,26 @@ mips_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class) return GR_REGS; return allocno_class; } + +/* Implement TARGET_LEGITIMATE_COMBINED_INSN */ + +static bool +mips_legitimate_combined_insn (rtx_insn* insn) +{ + /* If we do a conditional jump involving register compares do not allow + subregs because beq/bne will always compare the entire register. + This should only be an issue with N32/N64 ABI's that do a 32 bit + compare and jump. */ + + if (any_condjump_p (insn)) +{ + rtx cond = XEXP (SET_SRC (pc_set (insn)), 0); + if (GET_RTX_CLASS (GET_CODE (cond)) == RTX_COMPARE + || GET_RTX_CLASS (GET_CODE (cond)) == RTX_COMM_COMPARE) + return !SUBREG_P (XEXP (cond, 0)) && !SUBREG_P (XEXP (cond, 1)); +} + return true; +} /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP @@ -19868,6 +19888,9 @@ mips_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class) #undef TARGET_HARD_REGNO_SCRATCH_OK #define TARGET_HARD_REGNO_SCRATCH_OK mips_hard_regno_scratch_ok +#undef TARGET_LEGITIMATE_COMBINED_INSN +#define TARGET_LEGITIMATE_COMBINED_INSN mips_legitimate_combined_insn + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-mips.h" 2015-10-21 Steve Ellcey * gcc.dg/combine-subregs.c: New test. diff --git a/gcc/testsuite/gcc.dg/combine-subregs.c b/gcc/testsuite/gcc.dg/combine-subregs.c index e69de29..c480f88 100644 --- a/gcc/testsuite/gcc.dg/combine-subregs.c +++ b/gcc/testsuite/gcc.dg/combine-subregs.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fexpensive-optimizations" } */ + +#include +#include + +void __attribute__ ((noinline)) +foo (uint64_t state, uint32_t last) +{ + if (state == last) abort (); +} + +/* This function may do a bad comparision by trying to + use SUBREGS during the compare on machines where comparing + two registers always compares the entire register regardless + of mode. */ + +int __attribute__ ((noinline)) +compare (uint64_t state, uint32_t *last, uint8_t buf) +{ +if (*last == ((state | buf) & 0x)) { + foo (state, *last); +return 0; +} +return 1; +} + +int +main(int argc, char **argv) { +uint64_t state = 0xF0100U; +uint32_t last = 0x101U; +int ret= compare(state, &last, 0x01); +if (ret != 0) + abort (); +exit (0); +}
Re: [Patch, MIPS] Frame header optimization for MIPS (part 2)
On Wed, 2015-10-21 at 20:45 +0200, Bernd Schmidt wrote: > On 10/21/2015 07:38 PM, Mike Stump wrote: > > On Oct 20, 2015, at 3:55 PM, Bernd Schmidt wrote: > >> There are many unnecessary parentheses in this patch which should be > >> removed. > > > > I can smell a -Wall patch. :-) > > Not really, because -Wall isn't in the business of enforcing a coding style. > > > Bernd Hm, how about a separate warning that wasn't part of -Wall but could still be used by GCC or other products that wanted to enforce a 'no unnecessary parenthesis' coding style. Not that I'm volunteering. Steve Ellcey sell...@imgtec.com
Re: [Patch, MIPS] Patch to fix MIPS optimization bug in combine
Andrew, Here is the new patch that I am currently testing with your change and incorporating Eric's comment. I included both test cases but renamed yours and put it into gcc.dg/torture. Does the code in combine.c to address Eric's comment look OK to you? Steve Ellcey steve.ell...@imgtec.com 2015-10-21 Steve Ellcey Andrew Pinski * combine.c (simplify_comparison): Use gen_lowpart_or_truncate instead of gen_lowpart when we had a truncating and. diff --git a/gcc/combine.c b/gcc/combine.c index fd3e19c..7568ebb 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -11529,8 +11529,8 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) tmode != GET_MODE (op0); tmode = GET_MODE_WIDER_MODE (tmode)) if ((unsigned HOST_WIDE_INT) c0 == GET_MODE_MASK (tmode)) { - op0 = gen_lowpart (tmode, inner_op0); - op1 = gen_lowpart (tmode, inner_op1); + op0 = gen_lowpart_or_truncate (tmode, inner_op0); + op1 = gen_lowpart_or_truncate (tmode, inner_op1); code = unsigned_condition (code); changed = 1; break; @@ -12048,12 +12048,9 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) & GET_MODE_MASK (mode)) + 1)) >= 0 && const_op >> i == 0 - && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode - && (TRULY_NOOP_TRUNCATION_MODES_P (tmode, GET_MODE (op0)) - || (REG_P (XEXP (op0, 0)) - && reg_truncated_to_mode (tmode, XEXP (op0, 0) + && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode) { - op0 = gen_lowpart (tmode, XEXP (op0, 0)); + op0 = gen_lowpart_or_truncate (tmode, XEXP (op0, 0)); continue; } 2015-10-21 Steve Ellcey Andrew Pinski * gcc.dg/torture/pr67736.c: New test. * gcc.dg/combine-subregs.c: New test. diff --git a/gcc/testsuite/gcc.dg/combine-subregs.c b/gcc/testsuite/gcc.dg/combine-subregs.c index e69de29..c480f88 100644 --- a/gcc/testsuite/gcc.dg/combine-subregs.c +++ b/gcc/testsuite/gcc.dg/combine-subregs.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fexpensive-optimizations" } */ + +#include +#include + +void __attribute__ ((noinline)) +foo (uint64_t state, uint32_t last) +{ + if (state == last) abort (); +} + +/* This function may do a bad comparision by trying to + use SUBREGS during the compare on machines where comparing + two registers always compares the entire register regardless + of mode. */ + +int __attribute__ ((noinline)) +compare (uint64_t state, uint32_t *last, uint8_t buf) +{ +if (*last == ((state | buf) & 0x)) { + foo (state, *last); +return 0; +} +return 1; +} + +int +main(int argc, char **argv) { +uint64_t state = 0xF0100U; +uint32_t last = 0x101U; +int ret= compare(state, &last, 0x01); +if (ret != 0) + abort (); +exit (0); +} diff --git a/gcc/testsuite/gcc.dg/torture/pr67736.c b/gcc/testsuite/gcc.dg/torture/pr67736.c index e69de29..b45874e 100644 --- a/gcc/testsuite/gcc.dg/torture/pr67736.c +++ b/gcc/testsuite/gcc.dg/torture/pr67736.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ + +#include + +typedef unsigned long long uint64_t; +void f(uint64_t *a, uint64_t aa) __attribute__((noinline)); +void f(uint64_t *a, uint64_t aa) +{ + uint64_t new_value = aa; + uint64_t old_value = *a; + int bit_size = 32; +uint64_t mask = (uint64_t)(unsigned)(-1); +uint64_t tmp = old_value & mask; +new_value &= mask; +/* On overflow we need to add 1 in the upper bits */ +if (tmp > new_value) +new_value += 1ull<
[PATCH] Fix PR rtl-optimization/67736 in combine.c
This is a new patch to fix PR rtl-optimization/67736. My original patch was MIPS specific and that was rejected and Andrew Pinski pointed me to a patch he had sent out several years ago but that was never checked in. I updated the patch and tested it with no regressions and would like to check it in. I addressed Eric Botcazou comments from the original patch and also included the (updated) test case that Andrew had as well as one I created. OK to checkin? Steve Ellcey sell...@imgtec.com My original patch and the follow up discussion is at: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02111.html Andrews original patch is at: https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00401.html 2015-10-21 Steve Ellcey Andrew Pinski PR rtl-optimization/67736 * combine.c (simplify_comparison): Use gen_lowpart_or_truncate instead of gen_lowpart when we had a truncating and. diff --git a/gcc/combine.c b/gcc/combine.c index fd3e19c..7568ebb 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -11529,8 +11529,8 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) tmode != GET_MODE (op0); tmode = GET_MODE_WIDER_MODE (tmode)) if ((unsigned HOST_WIDE_INT) c0 == GET_MODE_MASK (tmode)) { - op0 = gen_lowpart (tmode, inner_op0); - op1 = gen_lowpart (tmode, inner_op1); + op0 = gen_lowpart_or_truncate (tmode, inner_op0); + op1 = gen_lowpart_or_truncate (tmode, inner_op1); code = unsigned_condition (code); changed = 1; break; @@ -12048,12 +12048,9 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) & GET_MODE_MASK (mode)) + 1)) >= 0 && const_op >> i == 0 - && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode - && (TRULY_NOOP_TRUNCATION_MODES_P (tmode, GET_MODE (op0)) - || (REG_P (XEXP (op0, 0)) - && reg_truncated_to_mode (tmode, XEXP (op0, 0) + && (tmode = mode_for_size (i, MODE_INT, 1)) != BLKmode) { - op0 = gen_lowpart (tmode, XEXP (op0, 0)); + op0 = gen_lowpart_or_truncate (tmode, XEXP (op0, 0)); continue; } 2015-10-21 Steve Ellcey Andrew Pinski PR rtl-optimization/67736 * gcc.dg/torture/pr67736.c: New test. * gcc.dg/combine-subregs.c: New test. diff --git a/gcc/testsuite/gcc.dg/combine-subregs.c b/gcc/testsuite/gcc.dg/combine-subregs.c index e69de29..c480f88 100644 --- a/gcc/testsuite/gcc.dg/combine-subregs.c +++ b/gcc/testsuite/gcc.dg/combine-subregs.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fexpensive-optimizations" } */ + +#include +#include + +void __attribute__ ((noinline)) +foo (uint64_t state, uint32_t last) +{ + if (state == last) abort (); +} + +/* This function may do a bad comparision by trying to + use SUBREGS during the compare on machines where comparing + two registers always compares the entire register regardless + of mode. */ + +int __attribute__ ((noinline)) +compare (uint64_t state, uint32_t *last, uint8_t buf) +{ +if (*last == ((state | buf) & 0x)) { + foo (state, *last); +return 0; +} +return 1; +} + +int +main(int argc, char **argv) { +uint64_t state = 0xF0100U; +uint32_t last = 0x101U; +int ret= compare(state, &last, 0x01); +if (ret != 0) + abort (); +exit (0); +} diff --git a/gcc/testsuite/gcc.dg/torture/pr67736.c b/gcc/testsuite/gcc.dg/torture/pr67736.c index e69de29..b45874e 100644 --- a/gcc/testsuite/gcc.dg/torture/pr67736.c +++ b/gcc/testsuite/gcc.dg/torture/pr67736.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ + +#include + +typedef unsigned long long uint64_t; +void f(uint64_t *a, uint64_t aa) __attribute__((noinline)); +void f(uint64_t *a, uint64_t aa) +{ + uint64_t new_value = aa; + uint64_t old_value = *a; + int bit_size = 32; +uint64_t mask = (uint64_t)(unsigned)(-1); +uint64_t tmp = old_value & mask; +new_value &= mask; +/* On overflow we need to add 1 in the upper bits */ +if (tmp > new_value) +new_value += 1ull<
Re: [PATCH] unconditionally compile most of the delay slot code
On Wed, 2015-10-21 at 11:13 -0400, tbsaunde+...@tbsaunde.org wrote: > From: Trevor Saunders > > Hi, > > $subject > > bootstrapped+ regtested x86_64-linux-gnu, I wouldn't mind a second pair of > eyes > on this one given its not totally trivial. > > Trev > > gcc/ChangeLog: > > 2015-10-20 Trevor Saunders > > * cfgrtl.c (pass_free_cfg::execute): Adjust. > * final.c (dbr_sequence_length): Always define. > (shorten_branches): Adjust. > * genattr-common.c (main): Always define DELAY_SLOTS. > * genattr.c (main): Unconditionally declare functions and define > macros related to delay slots. > * genattrtab.c (write_eligible_delay): Adjust. > (main): Always write out delay slot functions. > * opts.c (default_options_table): Adjust. > * reorg.c (redirect_with_delay_slots_safe_p): Likewise. > (redirect_with_delay_list_safe_p): Likewise. > (fill_simple_delay_slots): Likewise. > (fill_slots_from_thread): Likewise. > (make_return_insns): Likewise. > (dbr_schedule): Likewise. > (rest_of_handle_delay_slots): Likewise. > (pass_delay_slots::gate): Likewise. > * toplev.c (process_options): Likewise. Trevor, I am not sure if anyone else is having issues with this patch but it has broken my MIPS build. When building a cross compiler on x86 linux I get: build/genattrtab /scratch/sellcey/repos/build-bug/src/gcc/gcc/common.md /scratch/sellcey/repos/build-bug/src/gcc/gcc/config/mips/mips.md insn-conditions.md \ -Atmp-attrtab.c -Dtmp-dfatab.c -Ltmp-latencytab.c genattrtab: Internal error: abort in write_eligible_delay, at genattrtab.c:4524 make[1]: *** [s-attrtab] Error 1 make[1]: Leaving directory `/scratch/sellcey/repos/build-bug/obj-mips-mti-linux-gnu/initial_gcc/gcc' make: *** [all-gcc] Error 2 Error: Make command failed, stopping build. Any thoughts on what I should look at? Steve Ellcey sell...@imgtec.com
Re: [PATCH] Fix PR rtl-optimization/67736 in combine.c
On Thu, 2015-10-22 at 13:46 -0500, Segher Boessenkool wrote: > One nit and maybe a problem: > > > --- a/gcc/testsuite/gcc.dg/combine-subregs.c > > +++ b/gcc/testsuite/gcc.dg/combine-subregs.c > > @@ -0,0 +1,36 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-O2 -fexpensive-optimizations" } */ > > -fexpensive-optimizations is default at -O2. You are right, the original report I got mentioned this flag and then I was comparing -fexpensive-optimization and -fno-expensive-optimization so I didn't notice that it was on by default. I will remove it from the test. > > + > > +#include > > Does every target have that header? Shouldn't it be ? > > > --- a/gcc/testsuite/gcc.dg/torture/pr67736.c > > +++ b/gcc/testsuite/gcc.dg/torture/pr67736.c > > @@ -0,0 +1,32 @@ > > +/* { dg-do run } */ > > + > > +#include > > + > > And here you don't need inttypes at all? Confused. > > > Segher Not including the include in pr67736.c was an oversight, I meant to include it. I didn't notice that Andrew had put in his own definition of uint64_t into this test so it didn't need any includes, that is why it still worked. But you are right I should be using stdint.h instead of inttypes.h. It looks like most tests have: /* { dg-do run { target { stdint_types } } } */ #include So I will use that in both tests and remove the local definition of uint64_t from pr67736.c. Steve Ellcey sell...@imgtec.com
[Patch] Update email address
I am checking in this patch to update my email address. The sell...@mips.com address still works but I have been using sell...@imgtec.com recently as that is more 'official'. Steve Ellcey sell...@imgtec.com 2015-10-23 Steve Ellcey * MAINTAINERS: Update email address. diff --git a/MAINTAINERS b/MAINTAINERS index 3db08f1..d2f7a52 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -66,7 +66,7 @@ i386 port Jan Hubicka i386 port Uros Bizjak i386 vector ISA extns Kirill Yukhin ia64 port Jim Wilson -ia64 port Steve Ellcey +ia64 port Steve Ellcey iq2000 portNick Clifton lm32 port Sebastien Bourdeauducq m32c port DJ Delorie @@ -130,7 +130,7 @@ DJGPP DJ Delorie freebsdAndreas Tobler GNU/Hurd Thomas Schwinge hpux John David Anglin -hpux Steve Ellcey +hpux Steve Ellcey solarisRainer Orth netbsd Jason Thorpe netbsd Krister Walfridsson
Re: [Patch, MIPS] Frame header optimization for MIPS (part 2)
Just to follow up on this string, here is a new version of the patch with the extraneous parenthesis removed. Steve Ellcey sell...@imgtec.com 2015-10-23 Steve Ellcey * frame-header-opt.c (gate): Check for optimize > 0. (has_inlined_assembly): New function. (needs_frame_header_p): Remove is_leaf_function check, add argument type check. (callees_functions_use_frame_header): Add is_leaf_function and has_inlined_assembly calls.. (set_callers_may_not_allocate_frame): New function. (frame_header_opt): Add is_leaf_function call, add set_callers_may_not_allocate_frame call. * config/mips/mips.c (mips_compute_frame_info): Add check to see if callee saved regs can be put in frame header. (mips_expand_prologue): Add check to see if step1 is zero, fix cfa restores when using frame header to store regs. (mips_can_use_return_insn): Check to see if registers are stored in frame header. * config/mips/mips.h (machine_function): Add callers_may_not_allocate_frame and use_frame_header_for_callee_saved_regs fields. diff --git a/gcc/config/mips/frame-header-opt.c b/gcc/config/mips/frame-header-opt.c index 7c7b1f2..2cf589d 100644 --- a/gcc/config/mips/frame-header-opt.c +++ b/gcc/config/mips/frame-header-opt.c @@ -79,7 +79,7 @@ public: /* This optimization has no affect if TARGET_NEWABI. If optimize is not at least 1 then the data needed for the optimization is not available and nothing will be done anyway. */ - return TARGET_OLDABI && flag_frame_header_optimization; + return TARGET_OLDABI && flag_frame_header_optimization && optimize > 0; } virtual unsigned int execute (function *) { return frame_header_opt (); } @@ -125,6 +125,29 @@ is_leaf_function (function *fn) return true; } +/* Return true if this function has inline assembly code or if we cannot + be certain that it does not. False if know that there is no inline + assembly. */ + +static bool +has_inlined_assembly (function *fn) +{ + basic_block bb; + gimple_stmt_iterator gsi; + + /* If we do not have a cfg for this function be conservative and assume + it is may have inline assembly. */ + if (fn->cfg == NULL) +return true; + + FOR_EACH_BB_FN (bb, fn) +for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + if (gimple_code (gsi_stmt (gsi)) == GIMPLE_ASM) + return true; + + return false; +} + /* Return true if this function will use the stack space allocated by its caller or if we cannot determine for certain that it does not. */ @@ -136,20 +159,26 @@ needs_frame_header_p (function *fn) if (fn->decl == NULL) return true; - if (fn->stdarg || !is_leaf_function (fn)) + if (fn->stdarg) return true; for (t = DECL_ARGUMENTS (fn->decl); t; t = TREE_CHAIN (t)) { if (!use_register_for_decl (t)) - return true; + return true; + + /* Some 64 bit types may get copied to general registers using the frame +header, see mips_output_64bit_xfer. Checking for SImode only may be + overly restrictive but it is gauranteed to be safe. */ + if (DECL_MODE (t) != SImode) + return true; } return false; } -/* Returns TRUE if the argument stack space allocated by function FN is used. - Returns FALSE if the space is needed or if the need for the space cannot +/* Return true if the argument stack space allocated by function FN is used. + Return false if the space is needed or if the need for the space cannot be determined. */ static bool @@ -177,6 +206,8 @@ callees_functions_use_frame_header (function *fn) called_fn = DECL_STRUCT_FUNCTION (called_fn_tree); if (called_fn == NULL || DECL_WEAK (called_fn_tree) + || has_inlined_assembly (called_fn) + || !is_leaf_function (called_fn) || !called_fn->machine->does_not_use_frame_header) return true; } @@ -188,6 +219,41 @@ callees_functions_use_frame_header (function *fn) return false; } +/* Set the callers_may_not_allocate_frame flag for any function which + function FN calls because FN may not allocate a frame header. */ + +static void +set_callers_may_not_allocate_frame (function *fn) +{ + basic_block bb; + gimple_stmt_iterator gsi; + gimple *stmt; + tree called_fn_tree; + function *called_fn; + + if (fn->cfg == NULL) +return; + + FOR_EACH_BB_FN (bb, fn) +{ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + stmt = gsi_stmt (gsi); + if (is_gimple_call (stmt)) + { + called_fn_tree = gimple_call_fndecl (stmt); + if (called_fn_tree != NULL) + { +
[Patch] Change to argument promotion in fixed conversion library calls
This is part one of a two part patch I want to checkin in order to improve argument passing on MIPS. Right now MIPS defines TARGET_PROMOTE_PROTOTYPES as true and so short and char arguments are getting promoted in the caller and callee functions. The ABI requires callers to do the promotion and so I want to remove the definition of TARGET_PROMOTE_PROTOTYPES which will get rid of the promotion done in the callee. This matches what LLVM and the MIPS Greenhills compilers do. When I made this change I had one regression in the GCC testsuite (gcc.dg/fixed-point/convert-sat.c). I tracked this down to the fact that emit_library_call_value_1 does not do any argument promotion because it does not have the original tree type information for library calls. It only knows about modes. I can't change emit_library_call_value_1 to do the promotion because it does not know whether to do a signed or unsigned promotion, but expand_fixed_convert could do the conversion before calling emit_library_call_value_1 and that is what this patch does. Note that if a target uses one of the default argument promotion functions like default_promote_function_mode or default_promote_function_mode_always_promote, this change will have no affect because they call promote_mode and if promote_mode is called with a NULL_TREE as the first argument it returns the original mode. For target specific promote functions, this change should just cause the promotion to be done in expand_fixed_convert instead of in emit_library_call_value_1 and with the proper setting of unsignedp. This patch by itself will not fix my MIPS bug because it uses one of the default promote functions. Part two of this patch will be to define the TARGET_PROMOTE_FUNCTION_MODE macro on MIPS so that it promotes QI and HI to SI even if the type tree is NULL and then everything will work. The 'real' long term fix for this problem is to have tree types for builtin functions so the proper promotions can always be done but that is a fairly large change that I am not willing to tackle right now and it could probably not be done in time for GCC 6.0 anyway. See https://gcc.gnu.org/ml/gcc/2015-10/msg00234.html for more discussion of this change. Is this part of the patch OK to checkin? Tested on mips-mti-linux-gnu. Steve Ellcey sell...@imgtec.com 2015-11-06 Steve Ellcey * optabs.c (expand_fixed_convert): Call promote_function_mode. diff --git a/gcc/optabs.c b/gcc/optabs.c index fdcdc6a..964e92a 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -4880,6 +4880,24 @@ expand_fixed_convert (rtx to, rtx from, int uintp, int satp) libfunc = convert_optab_libfunc (tab, to_mode, from_mode); gcc_assert (libfunc); + if (SCALAR_INT_MODE_P (from_mode)) +{ + /* If we need to promote the integer function argument we need to do + it here instead of inside emit_library_call_value because here we + know if we should be doing a signed or unsigned promotion. */ + + machine_mode arg_mode; + int unsigned_p = 0; + + arg_mode = promote_function_mode (NULL_TREE, from_mode, + &unsigned_p, NULL_TREE, 0); + if (arg_mode != from_mode) + { + from = convert_to_mode (arg_mode, from, uintp); + from_mode = arg_mode; + } +} + start_sequence (); value = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST, to_mode, 1, from, from_mode);
Re: [Patch] Change to argument promotion in fixed conversion library calls
On Fri, 2015-11-06 at 20:14 +0100, Bernd Schmidt wrote: > > > + if (SCALAR_INT_MODE_P (from_mode)) > > +{ > > + /* If we need to promote the integer function argument we need to do > > Extra space at the start of the comment. > > > + it here instead of inside emit_library_call_value because here we > > + know if we should be doing a signed or unsigned promotion. */ > > + > > + machine_mode arg_mode; > > + int unsigned_p = 0; > > + > > + arg_mode = promote_function_mode (NULL_TREE, from_mode, > > + &unsigned_p, NULL_TREE, 0); > > + if (arg_mode != from_mode) > > + { > > + from = convert_to_mode (arg_mode, from, uintp); > > + from_mode = arg_mode; > > + } > > +} > > Move this into a separate function (prepare_libcall_arg)? I'll think > about it over the weekend and let others chime in if they want, but I > think I'll probably end up approving it with that change. > > > Bernd Are you thinking of a simple function that is called on all targets or a target specific function? Maybe a target specific function would be safer. We could have: from = targetm.prepare_libcall_arg (from, uintp); from_mode = GET_MODE (from); And have the default prepare_libcall_arg simply return the original 'from' rtx value. We could also pass some other arguments, maybe the library call rtx (fun) and an integer to specify what argument this is (1, 2, 3, ...) if we wanted to try and generalize it even more. Steve Ellcey sell...@imgtec.com
Re: [Patch] Change to argument promotion in fixed conversion library calls
On Fri, 2015-11-06 at 20:29 +0100, Bernd Schmidt wrote: > On 11/06/2015 08:27 PM, Steve Ellcey wrote: > > > > Are you thinking of a simple function that is called on all targets or a > > target specific function? Maybe a target specific function would be > > safer. > > No, I think just what you have there is probably sufficient. > > > Bernd Bernd, Here is a version with the code moved into a new function. How does this look? 2015-11-09 Steve Ellcey * optabs.c (prepare_libcall_arg): New function. (expand_fixed_convert): Add call to prepare_libcall_arg. diff --git a/gcc/optabs.c b/gcc/optabs.c index fdcdc6a..fb25f90 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -4838,6 +4838,33 @@ expand_fix (rtx to, rtx from, int unsignedp) } } + +/* Promote integer arguments for a libcall if necessary. + emit_library_call_value cannot do the promotion because it does not + know if it should do a signed or unsigned promotion. This is because + there are no tree types defined for libcalls. */ + +static rtx +prepare_libcall_arg (rtx arg, int uintp) +{ + machine_mode mode = GET_MODE (arg); + machine_mode arg_mode; + if (SCALAR_INT_MODE_P (mode)) +{ + /* If we need to promote the integer function argument we need to do + it here instead of inside emit_library_call_value because in + emit_library_call_value we don't know if we should do a signed or + unsigned promotion. */ + + int unsigned_p = 0; + arg_mode = promote_function_mode (NULL_TREE, mode, + &unsigned_p, NULL_TREE, 0); + if (arg_mode != mode) + return convert_to_mode (arg_mode, arg, uintp); +} +return arg; +} + /* Generate code to convert FROM or TO a fixed-point. If UINTP is true, either TO or FROM is an unsigned integer. If SATP is true, we need to saturate the result. */ @@ -4880,6 +4907,9 @@ expand_fixed_convert (rtx to, rtx from, int uintp, int satp) libfunc = convert_optab_libfunc (tab, to_mode, from_mode); gcc_assert (libfunc); + from = prepare_libcall_arg (from, uintp); + from_mode = GET_MODE (from); + start_sequence (); value = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST, to_mode, 1, from, from_mode);
Re: [Patch] Change to argument promotion in fixed conversion library calls
On Mon, 2015-11-09 at 21:47 +0100, Bernd Schmidt wrote: > On 11/09/2015 05:59 PM, Steve Ellcey wrote: > > Here is a version with the code moved into a new function. How does > > this look? > > > > 2015-11-09 Steve Ellcey > > > > * optabs.c (prepare_libcall_arg): New function. > > (expand_fixed_convert): Add call to prepare_libcall_arg. > > Hold on a moment - I see that emit_library_call_value_1 calls > promote_function_mode for arguments. Can you investigate why that > doesn't do what you need? > > > Bernd emit_library_call_value_1 has no way of knowing if the promotion should be signed or unsigned because it has a mode (probably QImode or HImode) that it knows may need to be promoted to SImode but it has no way to know if that should be a signed or unsigned promotion because it has no tree type information about the library call argument types. Right now it guesses based on the return type but it may guess wrong when converting an unsigned int to a signed fixed type or visa versa. By doing the promotion in expand_fixed_convert GCC can use the uintp argument to ensure that the signedness of the promotion is done correctly. We could pass that argument into emit_library_call_value_1 so it can do the correct promotion but that would require changing the argument list for emit_library_call and emit_library_call_value_1 and changing all the other call locations for those functions and that seemed like overkill. Steve Ellcey
[Patch, MIPS] Remove definition of TARGET_PROMOTE_PROTOTYPES
This patch removes the definition of TARGET_PROMOTE_PROTOTYPES from MIPS, where it was defined as true, so that it now defaults to false. Currently MIPS does prototype promotion in the caller and the callee and this patch removes the TARGET_PROMOTE_PROTOTYPES macro definition so that the promotion is only done in the caller (due to PROMOTE_MODE being defined). This does not break the ABI which requires the caller to do promotions anyway. (See https://gcc.gnu.org/ml/gcc/2015-10/msg00223.html). This change also causes GCC to match what the LLVM and Greenhills compilers already do on MIPS. After removing this macro I had three regressions, two were just tests that needed changing but one was a bug (gcc.dg/fixed-point/convert-sat.c). This test was calling a library function to convert a signed char into an unsigned fixed type and because we don't have tree type information about libcalls GCC cannot do the ABI required type promotion on those calls that it does on normal user defined calls. In fact promote_mode in explow.c expicitly returns without doing anything if no type is given it. Before this change it didn't matter on MIPS because the callee did the same promotion that the caller was supposed to have done before using the argument. Now that callee code is gone we depend on the caller doing the correct promotion and that was not happening. I submitted and checked in another patch to optabs.c (See https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00704.html) to provide me with the infrastructure to do the correct type promotion in expand_fixed and this patch redefines TARGET_PROMOTE_FUNCTION_MODE to return the needed promotion mode even when type is NULL_TREE. When type is set it does the same thing as it used to do. This change allows me to remote the definition of TARGET_PROMOTE_PROTOTYPES without the convert-sat.c test failing. The two tests that I changed are gcc.dg/tree-ssa/ssa-fre-4.c and gcc.target/mips/ext-2.c. ssa-fre-4.c no longer applies to MIPS now that we do not define TARGET_PROMOTE_PROTOTYPES so I removed the MIPS target from it. ext-2.c now generates an srl instruction instead of a dext instruction but the number of instructions has not changed and I updated the scan checks. Tested on mips-mti-linux-gnu with no unfixed regressions. OK to checkin? Steve Ellcey sell...@imgtec.com 2015-11-10 Steve Ellcey * config/mips/mips.c (mips_promote_function_mode): New function. (TARGET_PROMOTE_FUNCTION_MODE): Define as above function. (TARGET_PROMOTE_PROTOTYPES): Remove. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 9880b23..e9c3830 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -19760,6 +19760,32 @@ mips_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class) return GR_REGS; return allocno_class; } + +/* Implement TARGET_PROMOTE_FUNCTION_MODE */ + +/* This function is equivalent to default_promote_function_mode_always_promote + except that it returns a promoted mode even if type is NULL_TREE. This is + needed by libcalls which have no type (only a mode) such as fixed conversion + routines that take a signed or unsigned char/short argument and convert it + to a fixed type. */ + +static machine_mode +mips_promote_function_mode (const_tree type ATTRIBUTE_UNUSED, +machine_mode mode, +int *punsignedp ATTRIBUTE_UNUSED, +const_tree fntype ATTRIBUTE_UNUSED, +int for_return ATTRIBUTE_UNUSED) +{ + int unsignedp; + + if (type != NULL_TREE) +return promote_mode (type, mode, punsignedp); + + unsignedp = *punsignedp; + PROMOTE_MODE (mode, unsignedp, type); + *punsignedp = unsignedp; + return mode; +} /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP @@ -19864,10 +19890,7 @@ mips_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class) #define TARGET_GIMPLIFY_VA_ARG_EXPR mips_gimplify_va_arg_expr #undef TARGET_PROMOTE_FUNCTION_MODE -#define TARGET_PROMOTE_FUNCTION_MODE default_promote_function_mode_always_promote -#undef TARGET_PROMOTE_PROTOTYPES -#define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_true - +#define TARGET_PROMOTE_FUNCTION_MODE mips_promote_function_mode #undef TARGET_FUNCTION_VALUE #define TARGET_FUNCTION_VALUE mips_function_value #undef TARGET_LIBCALL_VALUE 2015-11-10 Steve Ellcey * gcc.dg/tree-ssa/ssa-fre-4.c: Remove mips*-*-* target. * gcc.target/mips/ext-2.c: Update scan checks. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c index 02b6719..5a7588f 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c @@ -1,6 +1,6 @@ /* If the target returns false for TARGET_PROMOTE_PROTOTYPES, then there will be no casts for FRE to eliminate and the test will fail. */ -/* { dg-
[Patch] Fix bug for frame instructions in annulled delay slots
While doing more testing of the MIPS frame header optimization I checked in earlier, I found I could not build the elf toolchain when I turned the optimization on by default. This is because GCC was putting a frame related instruction (the save of the return address) into an annulled delay slot. This only happens when the function prologue is moved after a branch by the shrink wrap optimization and it caused create_cfi_notes to abort because the code that generates the CFI information cannot handle stack related annulled instructions. This never occurred when I did my testing with the linux toolchain because I did not use -Os (which caused the use of instructions with annulled delay slots) and because -O2 does more inlining than -Os which meant there were fewer places where the return address could be saved in the callers frame header without the callee having to allocate its own stack space. The fix for this is not in mips specific code but in reorg.c where we restrict GCC from putting any frame related instructions into annulled delay slots. There was already code in reorg.c to prevent frame related instructions from going into normal delay slots but that code did not seem to cover all the unnulled delay slot situations. This patch extends the code to not put frame related instructions into annulled delay slots. Tested on MIPS with linux and elf toolchain builds and testsuite runs. OK to checkin? Steve Ellcey sell...@imgtec.com 2015-12-07 Steve Ellcey * reorg.c (optimize_skip): Do not put frame related instructions in annulled delay slots. (steal_delay_list_from_target): Ditto. (fill_slots_from_thread): Ditto. diff --git a/gcc/reorg.c b/gcc/reorg.c index cc68d6b..bab5f49 100644 --- a/gcc/reorg.c +++ b/gcc/reorg.c @@ -739,6 +739,7 @@ optimize_skip (rtx_jump_insn *insn, vec *delay_list) || recog_memoized (trial) < 0 || (! eligible_for_annul_false (insn, 0, trial, flags) && ! eligible_for_annul_true (insn, 0, trial, flags)) + || RTX_FRAME_RELATED_P (trial) || can_throw_internal (trial)) return; @@ -1126,7 +1127,13 @@ steal_delay_list_from_target (rtx_insn *insn, rtx condition, rtx_sequence *seq, trial, flags))) { if (must_annul) - used_annul = 1; + { + /* Frame related instructions cannot go into annulled delay +slots, it messes up the dwarf info. */ + if (RTX_FRAME_RELATED_P (trial)) + return; + used_annul = 1; + } rtx_insn *temp = copy_delay_slot_insn (trial); INSN_FROM_TARGET_P (temp) = 1; add_to_delay_list (temp, &new_delay_list); @@ -2464,9 +2471,9 @@ fill_slots_from_thread (rtx_jump_insn *insn, rtx condition, if (eligible_for_delay (insn, *pslots_filled, trial, flags)) goto winner; } - else if (0 - || (ANNUL_IFTRUE_SLOTS && ! thread_if_true) - || (ANNUL_IFFALSE_SLOTS && thread_if_true)) + else if (!RTX_FRAME_RELATED_P (trial) \ + && ((ANNUL_IFTRUE_SLOTS && ! thread_if_true) + || (ANNUL_IFFALSE_SLOTS && thread_if_true))) { old_trial = trial; trial = try_split (pat, trial, 0); 2015-12-07 Steve Ellcey * gcc.target/mips/wrap-delay.c: New test. diff --git a/gcc/testsuite/gcc.target/mips/wrap-delay.c b/gcc/testsuite/gcc.target/mips/wrap-delay.c index e69de29..1332a68 100644 --- a/gcc/testsuite/gcc.target/mips/wrap-delay.c +++ b/gcc/testsuite/gcc.target/mips/wrap-delay.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-options "-g -mframe-header-opt -mbranch-likely" } */ + +/* GCC was generating an ICE with the above options and -Os because + it was putting the save of $31 in two annulled delay slots but + there was only one restore since only one of the two saves could be + executed. This was correct code but it confused dwarf2cfi and + caused it to ICE when using the -g option. */ + + +enum demangle_component_type +{ + DEMANGLE_COMPONENT_TRINARY_ARG2, +}; +struct demangle_component +{ + enum demangle_component_type type; +}; +struct d_info +{ + int next_comp; + int num_comps; +}; +struct demangle_component * d_make_empty (struct d_info *di) +{ + if (di->next_comp >= di->num_comps) return ((void *)0); + ++di->next_comp; +} +struct demangle_component *d_make_comp ( + struct d_info *di, + enum demangle_component_type type, + struct demangle_component *left) +{ + struct demangle_component *p; + switch (type) +{ +case DEMANGLE_COMPONENT_TRINARY_ARG2: + if (left == ((void *)0)) return ((void *)0); +} + p = d_make_empty (di); + p->type = type; +}
Re: [Patch] Fix bug for frame instructions in annulled delay slots
On Mon, 2015-12-07 at 20:28 +0100, Bernd Schmidt wrote: > On 12/07/2015 07:54 PM, Steve Ellcey wrote: > > if (must_annul) > > - used_annul = 1; > > + { > > + /* Frame related instructions cannot go into annulled delay > > +slots, it messes up the dwarf info. */ > > + if (RTX_FRAME_RELATED_P (trial)) > > + return; > > Don't you need to use break rather than return? I am not sure about this. There is an earlier if statement in the loop that does a 'return' instead of a break (or continue) and there is a return in the 'else' part of the if that sets must_annul. Both of these are inside the loop that looks at all the instructions in the sequence 'seq'. I think the code is looking at all the instructions in the sequence and if any of them fail one of the tests in the loop (in this case require annulling) then we can't handle any of the instructions in the sequence and so we return immediately without putting any of the instructions from 'seq' in the delay slot. I believe a frame related instruction in an annulled branch needs to be handled that way too. > > > + else if (!RTX_FRAME_RELATED_P (trial) \ > > Stray backslash. That is easily fixed. > Other than that I think this is OK. There are some preexisting tests for > frame related insns already in this code. > > > Bernd >
Re: [Patch] Fix bug for frame instructions in annulled delay slots
On Mon, 2015-12-07 at 20:59 +0100, Bernd Schmidt wrote: > On 12/07/2015 08:43 PM, Steve Ellcey wrote: > > I am not sure about this. There is an earlier if statement in the loop > > that does a 'return' instead of a break (or continue) and there is a > > return in the 'else' part of the if that sets must_annul. Both of these > > are inside the loop that looks at all the instructions in the sequence > > 'seq'. I think the code is looking at all the instructions in the > > sequence and if any of them fail one of the tests in the loop (in this > > case require annulling) then we can't handle any of the instructions in > > the sequence and so we return immediately without putting any of the > > instructions from 'seq' in the delay slot. I believe a frame related > > instruction in an annulled branch needs to be handled that way too. > > Ah, I think I was looking at the other function that has the same > must_annul test (steal_delay_list_from_fallthrough). The patch is ok > without the backslash. Maybe the other function ought to be changed as > well though? > > > Bernd That would seem reasonable, though I don't have a test case for where the change to that routine would actually make a difference in the compilation. I'll create a patch and test it to make sure it doesn't cause any problems and then submit it as a follow-up to this change. Steve Ellcey sell...@imgtec.com
Re: [Patch] Fix bug for frame instructions in annulled delay slots
On Mon, 2015-12-07 at 12:30 -0700, Jeff Law wrote: > On 12/07/2015 12:28 PM, Bernd Schmidt wrote: > > On 12/07/2015 07:54 PM, Steve Ellcey wrote: > >> if (must_annul) > >> -used_annul = 1; > >> +{ > >> + /* Frame related instructions cannot go into annulled delay > >> + slots, it messes up the dwarf info. */ > >> + if (RTX_FRAME_RELATED_P (trial)) > >> +return; > > > > Don't you need to use break rather than return? > > > >> + else if (!RTX_FRAME_RELATED_P (trial) \ > > > > Stray backslash > > Other than that I think this is OK. There are some preexisting tests for > > frame related insns already in this code. > Also note there's probably port cleanup that could happen once this goes > in. IIRC the PA port (for example) explicitly disallows frame related > insns from many (most, all?) delay slots. Other targets may be doing > something similar. > > jeff If I had remembered/seen that code in pa.md earlier I might have fixed MIPS the same way. Oh well, I guess changing reorg.c is the better way to do it since it can be shared now. It looks like steal_delay_list_from_fallthrough is missing a lot of FRAME_RELATED checks (and other things) that steal_delay_list_from_target has in it (not just my changes). I think that means most of the work/fixing has been on machines like MIPS where the code in the delay slot is either always executed or only executed when the branch is taken (vs. untaken). I think steal_delay_list_from_fallthrough is only going to be used when the delay slot is executed on a branch-not-taken. Are there any machines like that? Steve Ellcey sell...@imgtec.com
[RFC] Request for comments on ivopts patch
I have an ivopts optimization question/proposal. When compiling the attached program the ivopts pass prefers the original ivs over new ivs and that causes us to generate less efficient code on MIPS. It may affect other platforms too. The Source code is a C strcmp: int strcmp (const char *p1, const char *p2) { const unsigned char *s1 = (const unsigned char *) p1; const unsigned char *s2 = (const unsigned char *) p2; unsigned char c1, c2; do { c1 = (unsigned char) *s1++; c2 = (unsigned char) *s2++; if (c1 == '\0') return c1 - c2; } while (c1 == c2); return c1 - c2; } Currently the code prefers the original ivs and so it generates code that increments s1 and s2 before doing the loads (and uses a -1 offset): : # s1_1 = PHI # s2_2 = PHI s1_6 = s1_1 + 1; c1_8 = MEM[base: s1_6, offset: 4294967295B]; s2_9 = s2_2 + 1; c2_10 = MEM[base: s2_9, offset: 4294967295B]; if (c1_8 == 0) goto ; else goto ; If I remove the cost increment for non-original ivs then GCC does the loads before the increments: : # ivtmp.6_17 = PHI # ivtmp.7_21 = PHI _25 = (void *) ivtmp.6_17; c1_8 = MEM[base: _25, offset: 0B]; _26 = (void *) ivtmp.7_21; c2_10 = MEM[base: _26, offset: 0B]; if (c1_8 == 0) goto ; else goto ; . . : ivtmp.6_14 = ivtmp.6_17 + 1; ivtmp.7_23 = ivtmp.7_21 + 1; if (c1_8 == c2_10) goto ; else goto ; This second case (without the preference for the original IV) generates better code on MIPS because the final assembly has the increment instructions between the loads and the tests of the values being loaded and so there is no delay (or less delay) between the load and use. It seems like this could easily be the case for other platforms too so I was wondering what people thought of this patch: 2015-12-08 Steve Ellcey * tree-ssa-loop-ivopts.c (determine_iv_cost): Remove preference for original ivs. diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 98dc451..26daabc 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -5818,14 +5818,6 @@ determine_iv_cost (struct ivopts_data *data, struct iv_cand *cand) cost = cost_step + adjust_setup_cost (data, cost_base.cost); - /* Prefer the original ivs unless we may gain something by replacing it. - The reason is to make debugging simpler; so this is not relevant for - artificial ivs created by other optimization passes. */ - if (cand->pos != IP_ORIGINAL - || !SSA_NAME_VAR (cand->var_before) - || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before))) -cost++; - /* Prefer not to insert statements into latch unless there are some already (so that we do not create unnecessary jumps). */ if (cand->pos == IP_END
[Patch] Fix for MIPS PR target/65604
This is a MIPS patch to make mips_output_division obey the -fno-delayed-branch flag. Right now, with mips1 and -mcheck-zero-division, the division instruction is put into the bne delay slot even when -fno-delayed-branch is specified. This change uses a similar strategy to MIPS16 where we do the division first and then do the zero test while the division is being calculated. Tested with mips1 runs and by inspecting the code that is output. OK to checkin? Steve Ellcey sell...@imgtec.com 2015-12-09 Steve Ellcey PR target/65604 * config/mips/mips.c (mips_output_division): Check flag_delayed_branch. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 6145944..8444a91 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -13687,9 +13687,17 @@ mips_output_division (const char *division, rtx *operands) } else { - output_asm_insn ("%(bne\t%2,%.,1f", operands); - output_asm_insn (s, operands); - s = "break\t7%)\n1:"; + if (flag_delayed_branch) + { + output_asm_insn ("%(bne\t%2,%.,1f", operands); + output_asm_insn (s, operands); + s = "break\t7%)\n1:"; + } + else + { + output_asm_insn (s, operands); + s = "bne\t%2,%.,1f\n\tnop\n\tbreak\t7\n1:"; + } } } return s;
Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching
On Mon, 2019-10-07 at 12:04 +0200, Richard Biener wrote: > On Tue, Oct 1, 2019 at 1:48 PM Dmitrij Pochepko > wrote: > > > > Hi Richard, > > > > I updated patch according to all your comments. > > Also bootstrapped and tested again on x86_64-pc-linux-gnu and > > aarch64-linux-gnu, which took some time. > > > > attached v3. > > OK. > > Thanks, > Richard. Dmitrij, I checked in this patch for you. Steve Ellcey sell...@marvell.com
Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute
Thanks for the feedback Kyrill. I have updated my patch and attached the new version to this email. The one change I did not make was to remove load_pair_dw_tftf and store_pair_dw_tftf and use the load_pair and vec_store_pair patterns. Having to add two new iterators to remove two instructions didn't seem like much of an advantage and I liked having the names for tf to match the other load_pair_dw/store_pair_dw instructions. If you feel strongly about this I can go ahead and make that change though. With respect to the new tests iterating over the various options, I am not sure how that works but it does. I copied the aarch64-torture.exp file from one of the other targets and verified that it ran the tests with -O0, -O1, -O2, '-O3 -g', -Os, '-O2 -flto -fno-use-linker- plugin -flto-partition=none', and '-O2 -flto -fuse-linker-plugin -fno- fat-lto-objects'. Steve Ellcey sell...@cavium.com 2018-08-06 Steve Ellcey * config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p): New prototype. (aarch64_epilogue_uses): Ditto. * config/aarch64/aarch64.c (aarch64_attribute_table): New array. (aarch64_simd_decl_p): New function. (aarch64_reg_save_mode): New function. (aarch64_is_simd_call_p): New function. (aarch64_function_ok_for_sibcall): Check for simd calls. (aarch64_layout_frame): Check for simd function. (aarch64_gen_storewb_pair): Handle E_TFmode. (aarch64_push_regs): Use aarch64_reg_save_mode to get mode. (aarch64_gen_loadwb_pair): Handle E_TFmode. (aarch64_pop_regs): Use aarch64_reg_save_mode to get mode. (aarch64_gen_store_pair): Handle E_TFmode. (aarch64_gen_load_pair): Ditto. (aarch64_save_callee_saves): Handle different mode sizes. (aarch64_restore_callee_saves): Ditto. (aarch64_components_for_bb): Check for simd function. (aarch64_epilogue_uses): New function. (aarch64_process_components): Ditto. (aarch64_expand_prologue): Ditto. (aarch64_expand_epilogue): Ditto. (aarch64_expand_call): Ditto. (TARGET_ATTRIBUTE_TABLE): New define. * config/aarch64/aarch64.h (EPILOGUE_USES): Redefine. (FP_SIMD_SAVED_REGNUM_P): New macro. * config/aarch64/aarch64.md (V23_REGNUM) New constant. (simple_return): New define_expand. (load_pair_dw_tftf): New instruction. (store_pair_dw_tftf): Ditto. (loadwb_pair_): Ditto. ("storewb_pair_): Ditto. 2018-08-06 Steve Ellcey * gcc.target/aarch64/torture/aarch64-torture.exp: New file. * gcc.target/aarch64/torture/simd-abi-1.c: New test. * gcc.target/aarch64/torture/simd-abi-2.c: Ditto. * gcc.target/aarch64/torture/simd-abi-3.c: Ditto. * gcc.target/aarch64/torture/simd-abi-4.c: Ditto.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index af5db9c..99c962f 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx); bool aarch64_symbolic_address_p (rtx); bool aarch64_uimm12_shift (HOST_WIDE_INT); bool aarch64_use_return_insn_p (void); +bool aarch64_use_simple_return_insn_p (void); const char *aarch64_mangle_builtin_type (const_tree); const char *aarch64_output_casesi (rtx *); @@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx); /* Check for a legitimate floating point constant for FMOV. */ bool aarch64_float_const_representable_p (rtx); +extern int aarch64_epilogue_uses (int); + #if defined (RTX_CODE) void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode, rtx label_ref); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index fa01475..7ab2a60 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1027,6 +1027,15 @@ static const struct processor *selected_tune; /* The current tuning set. */ struct tune_params aarch64_tune_params = generic_tunings; +/* Table of machine attributes. */ +static const struct attribute_spec aarch64_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "aarch64_vector_pcs", 0, 0, true, false, false, false, NULL, NULL }, + { NULL, 0, 0, false, false, false, false, NULL, NULL } +}; + #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0) /* An ISA extension in the co-processor and main instruction set space. */ @@ -1405,6 +1414,31 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) return false; } +/* Return true if this is a definition of a vectorized simd function. */ + +static bool +aarch64_simd_decl_p (tree fndecl) +{ + if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fnd
Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute
I have a question about my own patch. In doing testing I realized that if I use the aarch64_vector_pcs attribute on a platform without SIMD (i.e. -march=armv8.1-a+nosimd) I get an ICE. That is obviously not what we want but I was wondering what the right behaviour is. We certainly don't want to generate code for a SIMD function on a target that does not support SIMD, but is it OK to declare something with the simd or aarch64_vector_pcs attribute if it is never used or called? Or should even the declaration of a function with the simd/aarch64_vector_pcs attribute result in a compilation error? I don't think we want to complain about simd declarations because those could show up in the system headers like mathcalls.h even when not used. Or should those be ifdef'ed in some way based on the __ARM_NEON macro so they don't show up when not compiling for SIMD? Any ideas on where should this check be done, I thought the TARGET_OPTION_VALID_ATTRIBUTE_P hook might be the right place, but that seems to be specific to the 'target' attribute only, not attributes in general. Steve Ellcey sell...@cavium.com
Re: [aarch64}: added variable issue rate feature for falkor
On Wed, 2018-08-15 at 12:45 +0200, Kai Tietz wrote: > > > I believe the scheduler provides hooks specifically for storing > > backend-specific scheduling state so we should > > avoid creating such static variables in aarch64.c. Can you use the > > TARGET_SCHED_*_SCHED_CONTEXT family of hooks here? > > Then it will be up to the scheduler midend to keep track of the > > state and > > between basic blocks, functions etc. > I think you refer to the ppc implementation. But if you take a closer > look to it, you will see that nevertheless such an implementation will > require global variables. > So I am not really sure it is worth to introduce the ..._SCHED_CONTEXT > API to avoid one global variable by introducing at least two others. > Neverthelss I am admit that making use of SCHED_CONTEXT could be a > general nice to have, but not necessarily a gain in that case. I think having the SCHED_CONTEXT infrastructure could be helpful to me for Thunderx2. I am looking at an issue that I think TARGET_SCHED_VARIABLE_ISSUE will help with so having the general API available would be useful. I am looking at the lbm benchmark from SPEC and I believe it is overscheduling floating point instructions, if I cut the issue rate from 4 to 2, I get better performance on this one benchmark, but not on others. According to the Thunderx2 description, we can dispatch up to 4 uops per clock, but only up to 2 FP/SIMD uops. I know dispatch is different than issue but I don't think that GCC models both. For my patch I would probably want to save the previous instruction scheduled so that if it and the current one are both FP/SIMD ops, then that is all we can issue. I might need to save several instructions, not just the last one, to get everything correct. Steve Ellcey sell...@cavium.com
Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute
On Tue, 2018-08-07 at 12:15 -0500, Segher Boessenkool wrote: > > +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[01234567\]" } } > > */ > That's [0-7] but maybe you find [01234567] more readable here. Segher, I fixed all the issues you pointed out except this one. Since there are some uses of non consecutive numbers in one of the tests I decided to leave [01234567] instead of using [0-7]. Here is the latest version of the patch, there are no semantic changes, just syntactical ones to address the issues that you pointed out. Steve Ellcey sell...@cavium.com 2018-08-20 Steve Ellcey * config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p): New prototype. (aarch64_epilogue_uses): Ditto. * config/aarch64/aarch64.c (aarch64_attribute_table): New array. (aarch64_simd_decl_p): New function. (aarch64_reg_save_mode): New function. (aarch64_is_simd_call_p): New function. (aarch64_function_ok_for_sibcall): Check for simd calls. (aarch64_layout_frame): Check for simd function. (aarch64_gen_storewb_pair): Handle E_TFmode. (aarch64_push_regs): Use aarch64_reg_save_mode to get mode. (aarch64_gen_loadwb_pair): Handle E_TFmode. (aarch64_pop_regs): Use aarch64_reg_save_mode to get mode. (aarch64_gen_store_pair): Handle E_TFmode. (aarch64_gen_load_pair): Ditto. (aarch64_save_callee_saves): Handle different mode sizes. (aarch64_restore_callee_saves): Ditto. (aarch64_components_for_bb): Check for simd function. (aarch64_epilogue_uses): New function. (aarch64_process_components): Check for simd function. (aarch64_expand_prologue): Ditto. (aarch64_expand_epilogue): Ditto. (aarch64_expand_call): Ditto. (TARGET_ATTRIBUTE_TABLE): New define. * config/aarch64/aarch64.h (EPILOGUE_USES): Redefine. (FP_SIMD_SAVED_REGNUM_P): New macro. * config/aarch64/aarch64.md (V23_REGNUM) New constant. (simple_return): New define_expand. (load_pair_dw_tftf): New instruction. (store_pair_dw_tftf): Ditto. (loadwb_pair_): Ditto. ("storewb_pair_): Ditto. 2018-08-20 Steve Ellcey * gcc.target/aarch64/torture/aarch64-torture.exp: New file. * gcc.target/aarch64/torture/simd-abi-1.c: New test. * gcc.target/aarch64/torture/simd-abi-2.c: Ditto. * gcc.target/aarch64/torture/simd-abi-3.c: Ditto. * gcc.target/aarch64/torture/simd-abi-4.c: Ditto.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index af5db9c..99c962f 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx); bool aarch64_symbolic_address_p (rtx); bool aarch64_uimm12_shift (HOST_WIDE_INT); bool aarch64_use_return_insn_p (void); +bool aarch64_use_simple_return_insn_p (void); const char *aarch64_mangle_builtin_type (const_tree); const char *aarch64_output_casesi (rtx *); @@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx); /* Check for a legitimate floating point constant for FMOV. */ bool aarch64_float_const_representable_p (rtx); +extern int aarch64_epilogue_uses (int); + #if defined (RTX_CODE) void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode, rtx label_ref); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index fa01475..9cffec8 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1027,6 +1027,15 @@ static const struct processor *selected_tune; /* The current tuning set. */ struct tune_params aarch64_tune_params = generic_tunings; +/* Table of machine attributes. */ +static const struct attribute_spec aarch64_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "aarch64_vector_pcs", 0, 0, true, false, false, false, NULL, NULL }, + { NULL, 0, 0, false, false, false, false, NULL, NULL } +}; + #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0) /* An ISA extension in the co-processor and main instruction set space. */ @@ -1405,6 +1414,31 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) return false; } +/* Return true if this is a definition of a vectorized simd function. */ + +static bool +aarch64_simd_decl_p (tree fndecl) +{ + if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL) +return true; + if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL) +return false; + return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl; +} + +/* Return the mode a register save/restore should use. DImode for integer + registers, DFmode for FP registers in non-SIMD func
ToT build problem on Aarch64 after cfg.h change
Richard, I am having a problem building GCC after this patch: commit 2515797e5db67076d6cf7f3f185757c841f79edf Author: rguenth Date: Fri Aug 24 11:17:16 2018 + 2018-08-24 Richard Biener * cfg.h (struct control_flow_graph): Add edge_flags_allocated and bb_flags_allocated members. (auto_flag): New RAII class for allocating flags. (auto_edge_flag): New RAII class for allocating edge flags. (auto_bb_flag): New RAII class for allocating bb flags. * cfgloop.c (verify_loop_structure): Allocate temporary edge flag dynamically. * cfganal.c (dfs_enumerate_from): Remove use of visited sbitmap in favor of temporarily allocated BB flag. * hsa-brig.c: Re-order includes. * hsa-dump.c: Likewise. * hsa-regalloc.c: Likewise. * print-rtl.c: Likewise. * profile-count.c: Likewise. The failure is: In file included from /home/sellcey/gcc-m/src/gcc/gcc/config/aarch64/aarch64-speculation.cc:28:0: /home/sellcey/gcc-m/src/gcc/gcc/cfg.h: In constructor ‘auto_edge_flag::auto_edge_flag(function*)’: /home/sellcey/gcc-m/src/gcc/gcc/cfg.h:172:22: error: invalid use of incomplete type ‘struct function’ : auto_flag (&fun->cfg->edge_flags_allocated) {} It looks like this may be Aarch64 specific build problem since it is compiling a platform specific file. Is there just a missing include? Steve Ellcey sell...@cavium.com
Re: Fix MIPS builds
On Sun, 2018-08-26 at 20:56 -0600, Jeff Law wrote: > MIPS builds have been failing to build due to trying to use an > incomplete struct function type. The uses are coming from cfg.h which > typically isn't included in the target files. > > Fixed by including "backend.h" and removing the "cfg.h" inclusion. > > THere's likely other targets (aarch64?) that are suffering from a > similar problem and could likely be fixed in a similar way. I'll look > at those once I'm confident all my MIPS targets are building again. > > jeff I tried this change in gcc/config/aarch64/aarch64-speculation.cc (changing include of cfg.h to backend.h) and it fixed my build problem. Steve Ellcey
Re: Fix MIPS builds
On Mon, 2018-08-27 at 10:27 -0600, Jeff Law wrote: > External Email > > On 08/27/2018 09:55 AM, Steve Ellcey wrote: > > > > On Sun, 2018-08-26 at 20:56 -0600, Jeff Law wrote: > > > > > > MIPS builds have been failing to build due to trying to use an > > > incomplete struct function type. The uses are coming from cfg.h which > > > typically isn't included in the target files. > > > > > > Fixed by including "backend.h" and removing the "cfg.h" > > > inclusion. > > > > > > THere's likely other targets (aarch64?) that are suffering from a > > > similar problem and could likely be fixed in a similar way. I'll look > > > at those once I'm confident all my MIPS targets are building > > > again. > > > > > > jeff > > I tried this change in gcc/config/aarch64/aarch64-speculation.cc > > (changing include of cfg.h to backend.h) and it fixed my build problem. > Go ahead and commit it :-) backend.h seems like an appropriate thing to > include there. > > jeff OK, here is the patch I will check in: 2018-08-27 Steve Ellcey * config/aarch64/aarch64-speculation.cc: Replace include of cfg.h with include of backend.h. diff --git a/gcc/config/aarch64/aarch64-speculation.cc b/gcc/config/aarch64/aarch64-speculation.cc index 2dd06ae..3cd9ba0 100644 --- a/gcc/config/aarch64/aarch64-speculation.cc +++ b/gcc/config/aarch64/aarch64-speculation.cc @@ -25,7 +25,7 @@ #include "rtl.h" #include "tree-pass.h" #include "profile-count.h" -#include "cfg.h" +#include "backend.h" #include "cfgbuild.h" #include "print-rtl.h" #include "cfgrtl.h"
Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute
Ping. Any feedback from the Aarch64 maintainers? Steve Ellcey sell...@cavium.com On Mon, 2018-08-20 at 10:37 -0700, Steve Ellcey wrote: > On Tue, 2018-08-07 at 12:15 -0500, Segher Boessenkool wrote: > > > > > > > > > +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[01234567\]" } > > > } > > > */ > > That's [0-7] but maybe you find [01234567] more readable here. > Segher, I fixed all the issues you pointed out except this one. > Since > there are some uses of non consecutive numbers in one of the tests I > decided to leave [01234567] instead of using [0-7]. Here is the > latest version of the patch, there are no semantic changes, just > syntactical ones to address the issues that you pointed out. > > Steve Ellcey > sell...@cavium.com > > > 2018-08-20 Steve Ellcey > > * config/aarch64/aarch64-protos.h > (aarch64_use_simple_return_insn_p): > New prototype. > (aarch64_epilogue_uses): Ditto. > * config/aarch64/aarch64.c (aarch64_attribute_table): New > array. > (aarch64_simd_decl_p): New function. > (aarch64_reg_save_mode): New function. > (aarch64_is_simd_call_p): New function. > (aarch64_function_ok_for_sibcall): Check for simd calls. > (aarch64_layout_frame): Check for simd function. > (aarch64_gen_storewb_pair): Handle E_TFmode. > (aarch64_push_regs): Use aarch64_reg_save_mode to get mode. > (aarch64_gen_loadwb_pair): Handle E_TFmode. > (aarch64_pop_regs): Use aarch64_reg_save_mode to get mode. > (aarch64_gen_store_pair): Handle E_TFmode. > (aarch64_gen_load_pair): Ditto. > (aarch64_save_callee_saves): Handle different mode sizes. > (aarch64_restore_callee_saves): Ditto. > (aarch64_components_for_bb): Check for simd function. > (aarch64_epilogue_uses): New function. > (aarch64_process_components): Check for simd function. > (aarch64_expand_prologue): Ditto. > (aarch64_expand_epilogue): Ditto. > (aarch64_expand_call): Ditto. > (TARGET_ATTRIBUTE_TABLE): New define. > * config/aarch64/aarch64.h (EPILOGUE_USES): Redefine. > (FP_SIMD_SAVED_REGNUM_P): New macro. > * config/aarch64/aarch64.md (V23_REGNUM) New constant. > (simple_return): New define_expand. > (load_pair_dw_tftf): New instruction. > (store_pair_dw_tftf): Ditto. > (loadwb_pair_): Ditto. > ("storewb_pair_): Ditto. > > 2018-08-20 Steve Ellcey > > * gcc.target/aarch64/torture/aarch64-torture.exp: New file. > * gcc.target/aarch64/torture/simd-abi-1.c: New test. > * gcc.target/aarch64/torture/simd-abi-2.c: Ditto. > * gcc.target/aarch64/torture/simd-abi-3.c: Ditto. > * gcc.target/aarch64/torture/simd-abi-4.c: Ditto.
Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute
On Tue, 2018-09-04 at 17:20 +, Wilco Dijkstra wrote: > External Email > > Hi Steve, > > The latest version compiles the examples I used correctly, so it looks fine > from that perspective (but see comments below). However the key point of > the ABI is to enable better code generation when calling a vector function, > and that will likely require further changes that may conflict with this > patch. > Do you have patches for that work outstanding? It seems best to do this in > one go. > > Also did you check there is no regression in code generation for non-vector > functions? Yes, I checked for regressions in non-vector functions. They way this is written, there should be zero changes to any generated code that does not involve vector functions. I have some changes to improve code generation for SIMD functions but it is not ready to go yet. The main change I made to improve code generation is to modify TARGET_HARD_REGNO_CALL_PART_CLOBBERED to accept an instruction pointer so we could check to see if the call being made is to a 'normal' function or a 'simd' function and return the correct value. Unfortunately, there are many places where we don't have the call insn available and have to make the pessimistic guess. I also changed get_call_reg_set_usage so that it could return different default sets of what registers changed based on whether or not the function being called was normal or simd. Again, there are some places where we have to make a pessimistic guess. > > > +/* Return 1 if the register is used by the epilogue. We need to say the > + return register is used, but only after epilogue generation is complete. > + Note that in the case of sibcalls, the values "used by the epilogue" are > + considered live at the start of the called function. > + > + For SIMD functions we need to return 1 for FP registers that are saved and > + restored by a function but not zero in call_used_regs. If we do not do > + this optimizations may remove the restore of the register. */ > + > +int > +aarch64_epilogue_uses (int regno) > +{ > + if (epilogue_completed && (regno) == LR_REGNUM) > +return 1; > + if (aarch64_simd_decl_p (cfun->decl) && FP_SIMD_SAVED_REGNUM_P > (regno)) > +return 1; > + return 0; > +} > > I'm not convinced this is a good idea. It suggests GCC doesn't have the > correct set > of caller/callee-save registers for vector functions (I don't see a change to > update > CALL_USED_REGISTERS or aarch64_hard_regno_call_part_clobbered), which could > lead to all kinds of interesting issues. The problem is that there is no single correct set of caller/callee- saved registers anymore. There is one set for normal functions and a different set for simd functions. GCC isn't set up to have two different caller/callee saved ABIs on one target, there is only one CALL_USED_REGISTERS macro used to set one call_used_regs array. Should V16-V23 be set in CALL_USED_REGISTERS? For 'normal' functions, yes. For SIMD functions, no. > > +/* Return false for non-leaf SIMD functions in order to avoid > + shrink-wrapping them. Doing this will lose the necessary > + save/restore of FP registers. */ > + > +bool > +aarch64_use_simple_return_insn_p (void) > +{ > + if (aarch64_simd_decl_p (cfun->decl) && !crtl->is_leaf) > +return false; > + > + return true; > +} > > Such as this... > > Wilco
[PATCH][aarch64] Enable ifunc resolver attribute by default
I recently noticed that the GCC 'resolver' attribute used for ifunc's is not on by default for aarch64 even though all the infrastructure to support it is in place. I made memcpy an ifunc on aarch64 in glibc and am looking at possibly using it for libatomic too. For this reason I would like to enable it by default. Note that the memcpy ifunc works even when this is not enabled because glibc enables ifuncs by using the assembly language .type psuedo-op to set the resolver attribute when GCC cannot do it with an attribute. Using an ifunc in libatomic does require this to be enabled and I do not see any reason not to have it enabled by default. Tested with no regressions, OK to check in? Steve Ellcey sell...@cavium.com 2017-06-12 Steve Ellcey * config.gcc (aarch64*-*-linux*): Enable IFUNC by default. diff --git a/gcc/config.gcc b/gcc/config.gcc index a311cd95..e4caca4 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -974,6 +974,7 @@ aarch64*-*-freebsd*) tmake_file="${tmake_file} aarch64/t-aarch64 aarch64/t-aarch64-freebsd" ;; aarch64*-*-linux*) + default_gnu_indirect_function=yes tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h" tm_file="${tm_file} aarch64/aarch64-elf.h aarch64/aarch64-linux.h" tmake_file="${tmake_file} aarch64/t-aarch64 aarch64/t-aarch64-linux"
Re: [PATCH/AARCH64] Improve aarch64 conditional compare usage
https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00021.html Ping. Steve Ellcey sell...@cavium.com