Richard Biener <richard.guent...@gmail.com> writes: > On Fri, Oct 11, 2013 at 12:48 PM, Richard Sandiford > <rdsandif...@googlemail.com> wrote: >> Richard Biener <richard.guent...@gmail.com> writes: >>> On Fri, Oct 11, 2013 at 11:43 AM, Richard Sandiford >>> <rdsandif...@googlemail.com> wrote: >>>> Jakub Jelinek <ja...@redhat.com> writes: >>>>> On Fri, Oct 11, 2013 at 10:17:41AM +0200, Richard Biener wrote: >>>>>> asm(".alias __sync_synchronize sync_synchronize"); >>>>> >>>>> It is .set, but not everywhere. >>>>> /* The MIPS assembler has different syntax for .set. We set it to >>>>> .dummy to trap any errors. */ >>>>> #undef SET_ASM_OP >>>>> #define SET_ASM_OP "\t.dummy\t" >>>>> But perhaps it would require fewer variants than providing inline asm >>>>> of the __sync_* builtin by hand for all the targets that need it. >>>> >>>> Yeah, that's why I prefer the sed approach. GCC knows how to do exactly >>>> what we want, and what syntax to use. We just need to take its output and >>>> change the function name. >>>> >>>> And like Richard says, parsing top-level asms would be fair game, >>>> especially if GCC and binutils ever were integrated (for libgccjit.so). >>>> So using top-level asms seems like putting off the inevitable >>>> (albeit putting it off further than __asm renaming). >>>> >>>> Do either of you object to the sed thing? >>> >>> Well, ideally there would be a way to not lie about symbol names to GCC. >>> That is, have a "native" way of telling GCC what to do here (which is >>> as far as I understand to emit the code for the builtin-handled $SYM >>> in a function with $SYM - provide the out-of-line copy for a builtin). >>> >>> For the __sync functions it's unfortunate that the library function has >>> the same 'name' as the builtin and the builtin doesn't have an alternate >>> spelling. So - can't we just add __builtin__sync_... spellings and use >>> >>> __sync_synchronize () >>> { >>> __builtin_sync_syncronize (); >>> } >>> >>> ? (what if __builtin_sync_syncronize expands to a libcall? if it can't, >>> what's the point of the library function?) >> >> It can't expand to a libcall in nomips16 mode. It always expands to a >> libcall in mips16 mode. The point is to provide out-of-line nomips16 >> functions that the mips16 code can call. >> >>> Why does a simple >>> >>> __sync_synchronize () >>> { >>> __sync_synchronize (); >>> } >>> >>> not work? On x86_64 I get from that: >>> >>> __sync_synchronize: >>> .LFB0: >>> .cfi_startproc >>> mfence >>> ret >>> .cfi_endproc >>> >>> (similar to how you can have a definition of memcpy () and have >>> another memcpy inside it inline-expanded) >> >> Is that with optimisation enabled? -O2 gives me: >> >> __sync_synchronize: >> .LFB0: >> .cfi_startproc >> .p2align 4,,10 >> .p2align 3 >> .L2: >> jmp .L2 >> .cfi_endproc >> .LFE0: >> >> We do want to compile this stuff with optimisation enabled. > > I compiled with -O1 only. Yes, at -O2 I get the infinite loop. > > Eventually we should simply not build cgraph edges _from_ calls > to builtins? Or disable tail recursion for builtin calls (tail-recursion > is what does this optimization). > > Honza? For tailr this boils down to symtab_semantically_equivalent_p (). > I suppose we don't want to change that but eventually special case > builtins in tailr, thus > > Index: gcc/tree-tailcall.c > =================================================================== > --- gcc/tree-tailcall.c (revision 203409) > +++ gcc/tree-tailcall.c (working copy) > @@ -446,7 +446,9 @@ find_tail_calls (basic_block bb, struct > /* We found the call, check whether it is suitable. */ > tail_recursion = false; > func = gimple_call_fndecl (call); > - if (func && recursive_call_p (current_function_decl, func)) > + if (func > + && ! DECL_BUILT_IN (func) > + && recursive_call_p (current_function_decl, func)) > { > tree arg; > > which makes -O2 not turn it into an infinite loop (possibly also applies > to the original code with the alias declaration?)
If that's OK then I'm certainly happy with it :-) FWIW, the alias case was first optimised from __sync_synchronize->sync_synchronize, before it got converted into a tail call. That happened very early in the pipeline and I suppose would stop the built-in expansion from kicking in, even with tail calls disabled. But if we say that: foo () { foo (); } is a supported way of defining out-of-line versions of built-in foo, then that's much more convenient than the aliases anyway. Thanks, Richard