Richard Biener <richard.guent...@gmail.com> writes:
> On Fri, Oct 11, 2013 at 12:48 PM, Richard Sandiford
> <rdsandif...@googlemail.com> wrote:
>> Richard Biener <richard.guent...@gmail.com> writes:
>>> On Fri, Oct 11, 2013 at 11:43 AM, Richard Sandiford
>>> <rdsandif...@googlemail.com> wrote:
>>>> Jakub Jelinek <ja...@redhat.com> writes:
>>>>> On Fri, Oct 11, 2013 at 10:17:41AM +0200, Richard Biener wrote:
>>>>>> asm(".alias __sync_synchronize sync_synchronize");
>>>>>
>>>>> It is .set, but not everywhere.
>>>>> /* The MIPS assembler has different syntax for .set. We set it to
>>>>>    .dummy to trap any errors.  */
>>>>> #undef SET_ASM_OP
>>>>> #define SET_ASM_OP "\t.dummy\t"
>>>>> But perhaps it would require fewer variants than providing inline asm
>>>>> of the __sync_* builtin by hand for all the targets that need it.
>>>>
>>>> Yeah, that's why I prefer the sed approach.  GCC knows how to do exactly
>>>> what we want, and what syntax to use.  We just need to take its output and
>>>> change the function name.
>>>>
>>>> And like Richard says, parsing top-level asms would be fair game,
>>>> especially if GCC and binutils ever were integrated (for libgccjit.so).
>>>> So using top-level asms seems like putting off the inevitable
>>>> (albeit putting it off further than __asm renaming).
>>>>
>>>> Do either of you object to the sed thing?
>>>
>>> Well, ideally there would be a way to not lie about symbol names to GCC.
>>> That is, have a "native" way of telling GCC what to do here (which is
>>> as far as I understand to emit the code for the builtin-handled $SYM
>>> in a function with $SYM - provide the out-of-line copy for a builtin).
>>>
>>> For the __sync functions it's unfortunate that the library function has
>>> the same 'name' as the builtin and the builtin doesn't have an alternate
>>> spelling.  So - can't we just add __builtin__sync_... spellings and use
>>>
>>> __sync_synchronize ()
>>> {
>>>   __builtin_sync_syncronize ();
>>> }
>>>
>>> ?  (what if __builtin_sync_syncronize expands to a libcall?  if it can't,
>>> what's the point of the library function?)
>>
>> It can't expand to a libcall in nomips16 mode.  It always expands to a
>> libcall in mips16 mode.  The point is to provide out-of-line nomips16
>> functions that the mips16 code can call.
>>
>>> Why does a simple
>>>
>>> __sync_synchronize ()
>>> {
>>>   __sync_synchronize ();
>>> }
>>>
>>> not work?  On x86_64 I get from that:
>>>
>>> __sync_synchronize:
>>> .LFB0:
>>>         .cfi_startproc
>>>         mfence
>>>         ret
>>>         .cfi_endproc
>>>
>>> (similar to how you can have a definition of memcpy () and have
>>> another memcpy inside it inline-expanded)
>>
>> Is that with optimisation enabled?  -O2 gives me:
>>
>> __sync_synchronize:
>> .LFB0:
>>         .cfi_startproc
>>         .p2align 4,,10
>>         .p2align 3
>> .L2:
>>         jmp     .L2
>>         .cfi_endproc
>> .LFE0:
>>
>> We do want to compile this stuff with optimisation enabled.
>
> I compiled with -O1 only.  Yes, at -O2 I get the infinite loop.
>
> Eventually we should simply not build cgraph edges _from_ calls
> to builtins?  Or disable tail recursion for builtin calls (tail-recursion
> is what does this optimization).
>
> Honza?  For tailr this boils down to symtab_semantically_equivalent_p ().
> I suppose we don't want to change that but eventually special case
> builtins in tailr, thus
>
> Index: gcc/tree-tailcall.c
> ===================================================================
> --- gcc/tree-tailcall.c (revision 203409)
> +++ gcc/tree-tailcall.c (working copy)
> @@ -446,7 +446,9 @@ find_tail_calls (basic_block bb, struct
>    /* We found the call, check whether it is suitable.  */
>    tail_recursion = false;
>    func = gimple_call_fndecl (call);
> -  if (func && recursive_call_p (current_function_decl, func))
> +  if (func
> +      && ! DECL_BUILT_IN (func)
> +      && recursive_call_p (current_function_decl, func))
>      {
>        tree arg;
>
> which makes -O2 not turn it into an infinite loop (possibly also applies
> to the original code with the alias declaration?)

If that's OK then I'm certainly happy with it :-)  FWIW, the alias case
was first optimised from __sync_synchronize->sync_synchronize, before it
got converted into a tail call.  That happened very early in the pipeline
and I suppose would stop the built-in expansion from kicking in,
even with tail calls disabled.  But if we say that:

foo () { foo (); }

is a supported way of defining out-of-line versions of built-in foo,
then that's much more convenient than the aliases anyway.

Thanks,
Richard

Reply via email to