And some more formatting issues.
On 30/03/16 10:33, Ramana Radhakrishnan wrote:
>
>
> On 24/03/16 21:02, Charles Baylis wrote:
>> When compiling with -mlong-calls and -pg, calls to the __gnu_mcount_nc
>> function are not generated as long calls.
>>
>> This is the sequel to this patch
>> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00881.html
>>
>> This patch fixes the following problems with the previous patch.
>> . Nested functions now work (thanks to Richard E for spotting this)
>> . Thumb-1 now works
>
>>
>> This patch works by adding new patterns (one for ARM/Thumb-2 and one
>> for Thumb-1) which are placed in the prologue as a placeholder for
>> some RTL which describes the address. This is either a SYMBOL_REF for
>> targets with MOVW/MOVT, or a literal pool reference for other targets.
>> The implementation of ARM_FUNCTION_PROFILER is changed to search for
>> this insn so that the the address of the __gnu_mcount_nc function can
>> be loaded using an appropriate sequence for the target.
>
> I'm reasonably happy with the approach but there are nits.
>
>>
>> I also tried generating the profiling call sequence directly in the
>> prologue, but this requires some unpleasant hacks to prevent spurious
>> register pushes from ASM_OUTPUT_REG_PUSH.
>>
>> Tested with no new regressions on arm-unknown-linux-gnueabihf on QEMU.
>> The generated code sequences have been inspected for normal and nested
>> functions on ARM v6, ARM v7, Thumb-1, and Thumb-2 targets.
>>
>> This does not fix a regression, so I don't expect to apply it for
>> GCC6, is it OK for when stage 1 re-opens.
>
>
> I'm not sure how much testing coverage you get by just running the testsuite.
> Doing a profiled bootstrap with -mlong-calls and a regression test run for
> arm and / or thumb2 would be a useful test to do additionally - Given that
> this originally came from kernel builds with allyesconfig how important is
> this to be fixed for GCC 6 ? I'd rather take the fix into GCC 6 to get the
> kernel building again but that's something we can discuss with the RM's once
> the issues with the patch are fixed.
>>
>> gcc/ChangeLog:
>>
>> 2016-03-24 Charles Baylis <[email protected]>
>>
>> * config/arm/arm-protos.h (arm_emit_long_call_profile): New function.
>> * config/arm/arm.c (arm_emit_long_call_profile_insn): New function.
>> (arm_expand_prologue): Likewise.
>> (thumb1_expand_prologue): Likewise.
>> (arm_output_long_call_to_profile_func): Likewise.
>> (arm_emit_long_call_profile): Likewise.
>> * config/arm/arm.h: (ASM_OUTPUT_REG_PUSH) Update comment.
>> * config/arm/arm.md (arm_long_call_profile): New pattern.
>> * config/arm/bpabi.h (ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS): New
>> define.
>> * config/arm/thumb1.md (thumb1_long_call_profile): New pattern.
>> * config/arm/unspecs.md (unspecv): Add VUNSPEC_LONG_CALL_PROFILE.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-03-24 Charles Baylis <[email protected]>
>>
>> * gcc.target/arm/pr69770.c: New test.
>>
>>
>> 0001-PR69770-mlong-calls-does-not-affect-calls-to-__gnu_m.patch
>>
>>
>> From 5a39451f34be9b6ca98b3460bf40d879d6ee61a5 Mon Sep 17 00:00:00 2001
>> From: Charles Baylis <[email protected]>
>> Date: Thu, 24 Mar 2016 20:43:25 +0000
>> Subject: [PATCH] PR69770 -mlong-calls does not affect calls to
>> __gnu_mcount_nc
>> generated by -pg
>>
>> gcc/ChangeLog:
>>
>> 2016-03-24 Charles Baylis <[email protected]>
>>
>> * config/arm/arm-protos.h (arm_emit_long_call_profile): New function.
>> * config/arm/arm.c (arm_emit_long_call_profile_insn): New function.
>> (arm_expand_prologue): Likewise.
>> (thumb1_expand_prologue): Likewise.
>> (arm_output_long_call_to_profile_func): Likewise.
>> (arm_emit_long_call_profile): Likewise.
>> * config/arm/arm.h: (ASM_OUTPUT_REG_PUSH) Update comment.
>> * config/arm/arm.md (arm_long_call_profile): New pattern.
>> * config/arm/bpabi.h (ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS): New
>> define.
>> * config/arm/thumb1.md (thumb1_long_call_profile): New pattern.
>> * config/arm/unspecs.md (unspecv): Add VUNSPEC_LONG_CALL_PROFILE.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-03-24 Charles Baylis <[email protected]>
>>
>> * gcc.target/arm/pr69770.c: New test.
>>
>> Change-Id: I9b8de01fea083f17f729c3801f83174bedb3b0c6
>>
>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>> index 0083673..324c9f4 100644
>> --- a/gcc/config/arm/arm-protos.h
>> +++ b/gcc/config/arm/arm-protos.h
>> @@ -343,6 +343,7 @@ extern void arm_register_target_pragmas (void);
>> extern void arm_cpu_cpp_builtins (struct cpp_reader *);
>>
>> extern bool arm_is_constant_pool_ref (rtx);
>> +void arm_emit_long_call_profile ();
>>
>> /* Flags used to identify the presence of processor capabilities. */
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index c868490..040b255 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -21426,6 +21426,22 @@ output_probe_stack_range (rtx reg1, rtx reg2)
>> return "";
>> }
>>
>> +static void
>> +arm_emit_long_call_profile_insn ()
>
> s/()/(void)
>
>> +{
>> + rtx sym_ref = gen_rtx_SYMBOL_REF (Pmode, "__gnu_mcount_nc");
>> + /* if movt/movw are not available, use a constant pool */
>> + if (!arm_arch_thumb2)
>
> Please don't do that - use TARGET_USE_MOVT && !target_word_relocations
> please. And better still look at adding !target_word_relocations to the
> define of TARGET_USE_MOVT.
>
>> + {
>> + sym_ref = force_const_mem(Pmode, sym_ref);
>
> Space between force_const_mem and `('
>
>> + }
>> + rtvec vec = gen_rtvec (1, sym_ref);
>> + rtx tmp =
>> + gen_rtx_UNSPEC_VOLATILE (VOIDmode, vec, VUNSPEC_LONG_CALL_PROFILE);
>> + emit_insn (tmp);
>> +}
>> +
>
>> +
>> /* Generate the prologue instructions for entry into an ARM or Thumb-2
>> function. */
>> void
>> @@ -21789,6 +21805,10 @@ arm_expand_prologue (void)
>> arm_load_pic_register (mask);
>> }
>>
>> + if (crtl->profile && TARGET_LONG_CALLS
>> + && ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS)
>> + arm_emit_long_call_profile_insn ();
>> +
>> /* If we are profiling, make sure no instructions are scheduled before
>> the call to mcount. Similarly if the user has requested no
>> scheduling in the prolog. Similarly if we want non-call exceptions
>> @@ -24985,6 +25005,10 @@ thumb1_expand_prologue (void)
>> if (frame_pointer_needed)
>> thumb_set_frame_pointer (offsets);
>>
>> + if (crtl->profile && TARGET_LONG_CALLS
>> + && ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS)
>> + arm_emit_long_call_profile_insn ();
>> +
>> /* If we are profiling, make sure no instructions are scheduled before
>> the call to mcount. Similarly if the user has requested no
>> scheduling in the prolog. Similarly if we want non-call exceptions
>> @@ -30289,4 +30313,70 @@ arm_sched_fusion_priority (rtx_insn *insn, int
>> max_pri,
>> return;
>> }
>>
>> +static void
>> +arm_output_long_call_to_profile_func (rtx * operands, bool push_scratch)
>> +{
>
> Missing comment above function.
Blank lines between if blocks please
>
>> + /* operands[0] is the address of the __gnu_mcount_nc function
>> + operands[1] is the scratch register we use to load that address */
>> + if (push_scratch)
>> + output_asm_insn ("push\t{%1}", operands);
Here .
>> + output_asm_insn ("push\t{lr}", operands);
>> + if (GET_CODE (operands[0]) == SYMBOL_REF)
>> + {
>> + output_asm_insn ("movw\t%1, #:lower16:%c0", operands);
>> + output_asm_insn ("movt\t%1, #:upper16:%c0", operands);
>> + }
>> + else
>> + {
>> + output_asm_insn ("ldr\t%1, %0", operands);
>> + }
Here.
>> + if (!arm_arch5)
>> + {
>> + output_asm_insn ("mov\tlr, pc", operands);
>> + output_asm_insn ("mov\tpc, %1", operands);
>> + }
>> + else
>> + output_asm_insn ("blx\t%1", operands);
Here.
>> + if (push_scratch)
>> + output_asm_insn ("pop\t{%1}", operands);
Here and check the rest of the patch.
>> +}
>> +
>
>
>> +void
>> +arm_emit_long_call_profile()
>
> s/()/ (void)
>
> Missing comment.
>
>> +{
>> + rtx alcp = NULL;
>> + rtx operands[2];
>> + bool push_scratch;
>> + /* find the arm_long_call_profile */
>> + for (rtx_insn * insn = get_insns (); insn; insn = NEXT_INSN (insn))
>> + {
>> + if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END)
>> + break;
>> + if (INSN_CODE (insn) == CODE_FOR_arm_long_call_profile
>> + || INSN_CODE (insn) == CODE_FOR_thumb1_long_call_profile)
>> + {
>> + alcp = PATTERN (insn);
>> + break;
>> + }
>> + }
>> + gcc_assert (alcp);
>> +
>> + operands[0] = XEXP (XEXP (alcp, 0), 0);
>> + if (TARGET_32BIT)
>> + {
>> + operands[1] = gen_rtx_REG (SImode, IP_REGNUM);
>> + push_scratch = false;
>> + }
>> + else
>> + {
>> + /* for nested functions, we can set push_scratch to false, since
>> + final.c:profile_function.c and ASM_OUTPUT_REG_PUSH preserve it as
>> + part of the sequence to preserve ip across the call to the
>> + profiling function. */
>> + operands[1] = gen_rtx_REG (SImode, R0_REGNUM + 7);
>> + push_scratch = !IS_NESTED (arm_current_func_type ());
>> + }
>> + arm_output_long_call_to_profile_func (operands, push_scratch);
>> +}
>> +
>> #include "gt-arm.h"
>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
>> index 6352140..44f1c64 100644
>> --- a/gcc/config/arm/arm.h
>> +++ b/gcc/config/arm/arm.h
>> @@ -2044,7 +2044,9 @@ extern int making_const_table;
>> that ASM_OUTPUT_REG_PUSH will be matched with ASM_OUTPUT_REG_POP, and
>> that r7 isn't used by the function profiler, so we can use it as a
>> scratch reg. WARNING: This isn't safe in the general case! It may be
>> - sensitive to future changes in final.c:profile_function. */
>> + sensitive to future changes in final.c:profile_function. This is also
>> + relied on in arm_emit_long_call_profile() which assumes r7 can be
>> + used as a scratch register to load the address of the function profiler.
>> */
>> #define ASM_OUTPUT_REG_PUSH(STREAM, REGNO) \
>> do \
>> { \
>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>> index 47171b9..beb44d5 100644
>> --- a/gcc/config/arm/arm.md
>> +++ b/gcc/config/arm/arm.md
>> @@ -11424,6 +11424,15 @@
>> DONE;
>> })
>>
>> +(define_insn "arm_long_call_profile"
>> + [(unspec_volatile [(match_operand:SI 0 "general_operand" "ji,m")
>> + ] VUNSPEC_LONG_CALL_PROFILE)]
>> + "TARGET_32BIT"
>> + "%@ arm_long_call_profile"
>> + [(set_attr "arm_pool_range" "*,4096")
>> + (set_attr "arm_neg_pool_range" "*,4084")]
>> +)
>
> Please set the length of this insn appropriately. IIUC this is the length of
> the instructions from FUNCTION_PROFILER, though I suspect we could push all
> of this out into the RTL stream. Underestimating the length of this insn can
> cause issues with the minipool placement code later especially with -pg.
>
>> +
>> ;; Vector bits common to IWMMXT and Neon
>> (include "vec-common.md")
>> ;; Load the Intel Wireless Multimedia Extension patterns
>> diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
>> index 5d6c4ed..7acf66b 100644
>> --- a/gcc/config/arm/bpabi.h
>> +++ b/gcc/config/arm/bpabi.h
>> @@ -174,11 +174,18 @@
>>
>> #undef NO_PROFILE_COUNTERS
>> #define NO_PROFILE_COUNTERS 1
>> +#undef ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS
>> +#define ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS 1
>
> Why do we need this ?
>
>> #undef ARM_FUNCTION_PROFILER
>> #define ARM_FUNCTION_PROFILER(STREAM, LABELNO) \
>> { \
>> - fprintf (STREAM, "\tpush\t{lr}\n");
>> \
>> - fprintf (STREAM, "\tbl\t__gnu_mcount_nc\n");
>> \
>> + if (TARGET_LONG_CALLS) \
>> + {
>> \
>> + arm_emit_long_call_profile(); \
>> + } else { \
>> + fprintf (STREAM, "\tpush\t{lr}\n");
>> \
>> + fprintf (STREAM, "\tbl\t__gnu_mcount_nc\n"); \
>> + } \
>> }
>>
>
> While you are here delete the definition of FUNCTION_PROFILER that is
> conditional on THUMB_FUNCTION_PROFILER, looks like that code is dead.
>
>> #undef SUBTARGET_FRAME_POINTER_REQUIRED
>> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
>> index 072ed4d..9b1de5f 100644
>> --- a/gcc/config/arm/thumb1.md
>> +++ b/gcc/config/arm/thumb1.md
>> @@ -1798,7 +1798,7 @@
>> [(unspec_volatile [(match_operand:SI 0 "s_register_operand" "l")]
>> VUNSPEC_EH_RETURN)
>> (clobber (match_scratch:SI 1 "=&l"))]
>> - "TARGET_THUMB1"
>> + "TARGET_THUMB1 && 0"
>
> This means it wasn't tested for Thumb1 or you had to turn this off to get
> proper results for Thumb1 ?
>
> Please retest.
>
>> "#"
>> "&& reload_completed"
>> [(const_int 0)]
>> @@ -1809,4 +1809,13 @@
>> }"
>> [(set_attr "type" "mov_reg")]
>> )
>> +
>> +(define_insn "thumb1_long_call_profile"
>> + [(unspec_volatile [(match_operand:SI 0 "general_operand" "j,m")
>> + ] VUNSPEC_LONG_CALL_PROFILE)]
>> + "TARGET_THUMB1"
>> + "%@ thumb1_long_call_profile"
>> + [(set_attr "pool_range" "1018")]
>> +)
>> +
>>
>
> Please set length accordingly as well as check on the pool_range here.
>
>> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
>> index 5744c62..d7ddc3a 100644
>> --- a/gcc/config/arm/unspecs.md
>> +++ b/gcc/config/arm/unspecs.md
>> @@ -148,6 +148,7 @@
>> VUNSPEC_GET_FPSCR ; Represent fetch of FPSCR content.
>> VUNSPEC_SET_FPSCR ; Represent assign of FPSCR content.
>> VUNSPEC_PROBE_STACK_RANGE ; Represent stack range probing.
>> + VUNSPEC_LONG_CALL_PROFILE ; Represent a long call to profile function
>> ])
>>
>> ;; Enumerators for NEON unspecs.
>> diff --git a/gcc/testsuite/gcc.target/arm/pr69770.c
>> b/gcc/testsuite/gcc.target/arm/pr69770.c
>> new file mode 100644
>> index 0000000..61e5c6d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/pr69770.c
>> @@ -0,0 +1,9 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-pg -mlong-calls" } */
>> +
>> +extern void g(void);
>> +
>> +int f() { g(); return 0; }
>> +
>> +/* { dg-final { scan-assembler-not "bl\[ \t\]+__gnu_mcount_nc" } } */
>> +/* { dg-final { scan-assembler "__gnu_mcount_nc" } } */
>> -- 1.9.1
>>