Richard Biener <[email protected]> writes:
> On Fri, Oct 11, 2013 at 12:48 PM, Richard Sandiford
> <[email protected]> wrote:
>> Richard Biener <[email protected]> writes:
>>> On Fri, Oct 11, 2013 at 11:43 AM, Richard Sandiford
>>> <[email protected]> wrote:
>>>> Jakub Jelinek <[email protected]> writes:
>>>>> On Fri, Oct 11, 2013 at 10:17:41AM +0200, Richard Biener wrote:
>>>>>> asm(".alias __sync_synchronize sync_synchronize");
>>>>>
>>>>> It is .set, but not everywhere.
>>>>> /* The MIPS assembler has different syntax for .set. We set it to
>>>>> .dummy to trap any errors. */
>>>>> #undef SET_ASM_OP
>>>>> #define SET_ASM_OP "\t.dummy\t"
>>>>> But perhaps it would require fewer variants than providing inline asm
>>>>> of the __sync_* builtin by hand for all the targets that need it.
>>>>
>>>> Yeah, that's why I prefer the sed approach. GCC knows how to do exactly
>>>> what we want, and what syntax to use. We just need to take its output and
>>>> change the function name.
>>>>
>>>> And like Richard says, parsing top-level asms would be fair game,
>>>> especially if GCC and binutils ever were integrated (for libgccjit.so).
>>>> So using top-level asms seems like putting off the inevitable
>>>> (albeit putting it off further than __asm renaming).
>>>>
>>>> Do either of you object to the sed thing?
>>>
>>> Well, ideally there would be a way to not lie about symbol names to GCC.
>>> That is, have a "native" way of telling GCC what to do here (which is
>>> as far as I understand to emit the code for the builtin-handled $SYM
>>> in a function with $SYM - provide the out-of-line copy for a builtin).
>>>
>>> For the __sync functions it's unfortunate that the library function has
>>> the same 'name' as the builtin and the builtin doesn't have an alternate
>>> spelling. So - can't we just add __builtin__sync_... spellings and use
>>>
>>> __sync_synchronize ()
>>> {
>>> __builtin_sync_syncronize ();
>>> }
>>>
>>> ? (what if __builtin_sync_syncronize expands to a libcall? if it can't,
>>> what's the point of the library function?)
>>
>> It can't expand to a libcall in nomips16 mode. It always expands to a
>> libcall in mips16 mode. The point is to provide out-of-line nomips16
>> functions that the mips16 code can call.
>>
>>> Why does a simple
>>>
>>> __sync_synchronize ()
>>> {
>>> __sync_synchronize ();
>>> }
>>>
>>> not work? On x86_64 I get from that:
>>>
>>> __sync_synchronize:
>>> .LFB0:
>>> .cfi_startproc
>>> mfence
>>> ret
>>> .cfi_endproc
>>>
>>> (similar to how you can have a definition of memcpy () and have
>>> another memcpy inside it inline-expanded)
>>
>> Is that with optimisation enabled? -O2 gives me:
>>
>> __sync_synchronize:
>> .LFB0:
>> .cfi_startproc
>> .p2align 4,,10
>> .p2align 3
>> .L2:
>> jmp .L2
>> .cfi_endproc
>> .LFE0:
>>
>> We do want to compile this stuff with optimisation enabled.
>
> I compiled with -O1 only. Yes, at -O2 I get the infinite loop.
>
> Eventually we should simply not build cgraph edges _from_ calls
> to builtins? Or disable tail recursion for builtin calls (tail-recursion
> is what does this optimization).
>
> Honza? For tailr this boils down to symtab_semantically_equivalent_p ().
> I suppose we don't want to change that but eventually special case
> builtins in tailr, thus
>
> Index: gcc/tree-tailcall.c
> ===================================================================
> --- gcc/tree-tailcall.c (revision 203409)
> +++ gcc/tree-tailcall.c (working copy)
> @@ -446,7 +446,9 @@ find_tail_calls (basic_block bb, struct
> /* We found the call, check whether it is suitable. */
> tail_recursion = false;
> func = gimple_call_fndecl (call);
> - if (func && recursive_call_p (current_function_decl, func))
> + if (func
> + && ! DECL_BUILT_IN (func)
> + && recursive_call_p (current_function_decl, func))
> {
> tree arg;
>
> which makes -O2 not turn it into an infinite loop (possibly also applies
> to the original code with the alias declaration?)
If that's OK then I'm certainly happy with it :-) FWIW, the alias case
was first optimised from __sync_synchronize->sync_synchronize, before it
got converted into a tail call. That happened very early in the pipeline
and I suppose would stop the built-in expansion from kicking in,
even with tail calls disabled. But if we say that:
foo () { foo (); }
is a supported way of defining out-of-line versions of built-in foo,
then that's much more convenient than the aliases anyway.
Thanks,
Richard