On Wed, 23 Aug 2017, Uros Bizjak wrote: > On Wed, Aug 23, 2017 at 10:04 AM, Richard Biener <rguent...@suse.de> wrote: > > On Wed, 23 Aug 2017, Richard Biener wrote: > > > >> On Tue, 22 Aug 2017, Richard Biener wrote: > >> > >> > On August 22, 2017 6:38:55 PM GMT+02:00, "Richard Earnshaw (lists)" > >> > <richard.earns...@arm.com> wrote: > >> > >On 22/08/17 13:55, Kyrill Tkachov wrote: > >> > >> Hi Richard, > >> > >> [roping in more aarch64 maintainers] > >> > >> > >> > >> On 22/08/17 13:27, Richard Biener wrote: > >> > >>> On Tue, 22 Aug 2017, Uros Bizjak wrote: > >> > >>> > >> > >>>> On Tue, Aug 22, 2017 at 12:15 PM, Richard Biener > >> > ><rguent...@suse.de> > >> > >>>> wrote: > >> > >>>>> The following patch fixes PR81921 (and LTO build of libgo) which I > >> > >ran > >> > >>>>> into when trying to enable free-lang-data for non-LTO compiles. > >> > >>>>> > >> > >>>>> free-lang-data forces a DECL_FUNCTION_SPECIFIC_TARGET for all > >> > >functions > >> > >>>>> so we have them ending up with target_option_default_node > >> > >eventually > >> > >>>>> which is something ix86_can_inline_p doesn't expect (I tried > >> > >forcing > >> > >>>>> a compare of the actual options but that fails as well as we get > >> > >>>>> spurious differences in use-fpmath, the default node with -m32 > >> > >>>>> -march=x86_64 doesn't have it while non-default nodes have it). > >> > >>>>> > >> > >>>>> The patch is what I consider safe for branches, we might want to > >> > >work > >> > >>>>> on sth better (actually comparing always and fixing the fpmath > >> > >issue) > >> > >>>>> on trunk as followup. > >> > >>>>> > >> > >>>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for > >> > >trunk > >> > >>>>> and active branches? > >> > >>>>> > >> > >>>>> Note the change to the ref = false conditional isn't strictly > >> > >necessary > >> > >>>>> but it makes -flto and non-flto behave consistently. > > > > ... > > > >> > >> The aarch64 changes looks ok to me FWIW (since I wrote that function) > >> > >> > >> > > > >> > >I *think* this is probably OK for AArch64, but I don't think it's safe > >> > >in general. > >> > > > >> > >Consider, for example, the arm port where we can have functions built > >> > >for ARM and functions built for Thumb. You cannot, in general, inline > >> > >either into the other, (think inline asm, or isa-specific intrinsic > >> > >operations). > >> > > > >> > >I think this approach only works at all because it's generally > >> > >meaningless to have some ISA that is the default and then construct > >> > >other functions later in the compilation that deliberately target a > >> > >smaller subset of the ISA. But such an assumption doesn't apply when > >> > >you construct functions in completely separate ISAs which are > >> > >call-compatible but otherwise distinct. > >> > > >> > Note the patch doesn't change anything for NON-LTO, where you never see > >> > target_option_default_node. Just LTO forces those in place of NULL to > >> > make cross-TU behavior more like you suggest. > >> > >> So thinking about this some more we lack the ability to distinguish > >> between no explicit target attribute and an explicit target attribute > >> that matches the global defaults with LTO. Without LTO it is the > >> target that decides whether to put target_option_default_node into > >> DECL_FUNCTION_SPECIFIC_TARGET but LTO treats the global options > >> as explicit target attribute for all functions of a TU. > >> > >> That said, the > >> > >> /* If callee has no option attributes, then it is ok to inline */ > >> if (!callee_opts) > >> ret = true; > >> > >> tries to allow inlining of functions with no explicit attribute > >> into any other function which IMHO is reasonable? > >> > >> That argues for sth like DECL_FUNCTION_SPECIFIC_TARGET_USER_P and > >> DECL_FUNCTION_SPECIFIC_OPTIMIZATION_USER_P. > >> > >> Does that make any sense? The question is whether, in LTO context, > >> differing global options are USER_P or not ... > >> > >> OTOH as you say usually function specific opts are a superset of > >> global opts which means if the target properly checks two option > >> sets against each other the effect should be similar. > > > > Which means we can also do the following (only the default and the x86 > > hook changed), always compare and properly treat a not present > > DECL_FUNCTION_SPECIFIC_TARGET as target_option_default_node. > > > > To fix PR81921 we need to make sure to process target("sse2") when > > it might change fpmath use (for -m32). Not sure if that captures > > all cases so I wonder whether skipping the processing is not > > premature optimization -- we should do this at most once per function? > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk > > for the x86 part? Is the ix86_valid_target_attribute_tree change ok > > for branches? > > > > Note that as there is the chance this now rejects cases that it didn't > > before even w/o -flto such change isn't appropriate for the branches > > (the ix86_valid_target_attribute_tree change is). Note I arrived > > here because enabling free-lang-data unconditionally broke bootstrap, > > so the changes to the other archs where a similar thing may happen -- > > you can check with > > > > Index: gcc/tree.c > > =================================================================== > > --- gcc/tree.c (revision 251274) > > +++ gcc/tree.c (working copy) > > @@ -5660,8 +5660,7 @@ free_lang_data (void) > > unsigned i; > > > > /* If we are the LTO frontend we have freed lang-specific data already. > > */ > > - if (in_lto_p > > - || (!flag_generate_lto && !flag_generate_offload)) > > + if (in_lto_p) > > return 0; > > > > /* Provide a dummy TRANSLATION_UNIT_DECL if the FE failed to provide > > one. */ > > > > Richard, can you add a testcase where it is invalid to inline > > -mthumb/arm code into the other? I suppose that requires some > > inline asm to break. Or is there already one? > > > > Thanks, > > Richard. > > > > 2017-08-23 Richard Biener <rguent...@suse.de> > > > > PR target/81921 > > * targhooks.c (default_target_can_inline_p): Properly > > use target_option_default_node when no DECL_FUNCTION_SPECIFIC_TARGET > > is present and always compare. > > * config/i386/i386.c (ix86_valid_target_attribute_tree): Make > > sure to process the attribute when fpmath may change. > > (ix86_can_inline_p): Properly use target_option_default_node when > > no DECL_FUNCTION_SPECIFIC_TARGET is present and always compare. > > > > Index: gcc/targhooks.c > > =================================================================== > > --- gcc/targhooks.c (revision 251275) > > +++ gcc/targhooks.c (working copy) > > @@ -1442,27 +1442,18 @@ default_target_option_pragma_parse (tree > > bool > > default_target_can_inline_p (tree caller, tree callee) > > { > > - bool ret = false; > > tree callee_opts = DECL_FUNCTION_SPECIFIC_TARGET (callee); > > tree caller_opts = DECL_FUNCTION_SPECIFIC_TARGET (caller); > > - > > - /* If callee has no option attributes, then it is ok to inline */ > > - if (!callee_opts) > > - ret = true; > > - > > - /* If caller has no option attributes, but callee does then it is not ok > > to > > - inline */ > > - else if (!caller_opts) > > - ret = false; > > + if (! callee_opts) > > + callee_opts = target_option_default_node; > > + if (! caller_opts) > > + caller_opts = target_option_default_node; > > > > /* If both caller and callee have attributes, assume that if the > > pointer is different, the two functions have different target > > options since build_target_option_node uses a hash table for the > > options. */ > > - else > > - ret = (callee_opts == caller_opts); > > - > > - return ret; > > + return callee_opts == caller_opts; > > } > > > > /* If the machine does not have a case insn that compares the bounds, > > Index: gcc/config/i386/i386.c > > =================================================================== > > --- gcc/config/i386/i386.c (revision 251275) > > +++ gcc/config/i386/i386.c (working copy) > > @@ -7369,7 +7369,8 @@ ix86_valid_target_attribute_tree (tree a > > || opts->x_target_flags != def->x_target_flags > > || option_strings[IX86_FUNCTION_SPECIFIC_ARCH] > > || option_strings[IX86_FUNCTION_SPECIFIC_TUNE] > > - || enum_opts_set.x_ix86_fpmath) > > + || enum_opts_set.x_ix86_fpmath > > + || !TARGET_64BIT_P (opts->x_ix86_isa_flags)) > > You should use (!TARGET_64BIT_P ... && TARGET_SSE_P ...) to match the > condition in the special fpmath processing part. (BTW: The comment in > that part is wrong, we need SSE, not sse2 on 32-bit targets to use SSE > math.)
Hmpf, with the change I again run into the libgo bootstrap fail (appearantly my reduced testcase is incomplete) libtool: compile: /abuild/rguenther/obj2/./gcc/xgcc -B/abuild/rguenther/obj2/./gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -B/usr/local/x86_64-pc-linux-gnu/lib/ -isystem /usr/local/x86_64-pc-linux-gnu/include -isystem /usr/local/x86_64-pc-linux-gnu/sys-include -m32 -DHAVE_CONFIG_H -I. -I/space/rguenther/src/svn/trunk2/libgo -I /space/rguenther/src/svn/trunk2/libgo/runtime -I/space/rguenther/src/svn/trunk2/libgo/../libffi/include -I../libffi/include -pthread -fexceptions -fnon-call-exceptions -fplan9-extensions -fsplit-stack -Wall -Wextra -Wwrite-strings -Wcast-qual -Werror -minline-all-stringops -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I /space/rguenther/src/svn/trunk2/libgo/../libgcc -I /space/rguenther/src/svn/trunk2/libgo/../libbacktrace -I ../../../gcc/include -g -O2 -m32 -MT aeshash.lo -MD -MP -MF .deps/aeshash.Tpo -c /space/rguenther/src/svn/trunk2/libgo/runtime/aeshash.c -fPIC -DPIC -o .libs/aeshash.o In file included from /space/rguenther/src/svn/trunk2/libgo/runtime/aeshash.c:17:0: /space/rguenther/src/svn/trunk2/libgo/runtime/aeshash.c: In function ‘aeshashbody’: /abuild/rguenther/obj2/./gcc/include/emmintrin.h:700:1: error: inlining failed in call to always_inline ‘_mm_loadu_si128’: target specific option mismatch _mm_loadu_si128 (__m128i_u const *__P) ^~~~~~~~~~~~~~~ /space/rguenther/src/svn/trunk2/libgo/runtime/aeshash.c:387:11: note: called from here mseed ^= _mm_loadu_si128(aeskeysched.__values); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ that appens even with making the if a if (1). The check we run into is somehow still else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath) ret = false; ah ... #ifndef __SSE__ #pragma GCC push_options #pragma GCC target("sse") #define __DISABLE_SSE__ #endif /* __SSE__ */ but on x86_64 when building with -m32 we _do_ have __SSE__ set so the intrinsics get target_option_default_node which doesn't have fmpath=sse. So x86_64 -m32 -march=i586 enables SSE math for the intrinsics while x86_64 -m32 [-march=x86-64] does not... Which means we have to fix the intrinsic headers to unconditionally push "sse" or relax the comparison and allow mismatching fpmath? Richard. > Other x86 changes LGTM. > > Uros. > > > { > > /* If we are using the default tune= or arch=, undo the string > > assigned, > > and use the default. */ > > @@ -7502,53 +7503,47 @@ ix86_valid_target_attribute_p (tree fnde > > static bool > > ix86_can_inline_p (tree caller, tree callee) > > { > > - bool ret = false; > > tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller); > > tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee); > > - > > - /* If callee has no option attributes, then it is ok to inline. */ > > if (!callee_tree) > > - ret = true; > > + callee_tree = target_option_default_node; > > + if (!caller_tree) > > + caller_tree = target_option_default_node; > > + if (callee_tree == caller_tree) > > + return true; > > > > - /* If caller has no option attributes, but callee does then it is not ok > > to > > - inline. */ > > - else if (!caller_tree) > > + struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree); > > + struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree); > > + bool ret = false; > > + > > + /* Callee's isa options should be a subset of the caller's, i.e. a SSE4 > > + function can inline a SSE2 function but a SSE2 function can't inline > > + a SSE4 function. */ > > + if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) > > + != callee_opts->x_ix86_isa_flags) > > + || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) > > + != callee_opts->x_ix86_isa_flags2)) > > ret = false; > > > > - else > > - { > > - struct cl_target_option *caller_opts = TREE_TARGET_OPTION > > (caller_tree); > > - struct cl_target_option *callee_opts = TREE_TARGET_OPTION > > (callee_tree); > > + /* See if we have the same non-isa options. */ > > + else if (caller_opts->x_target_flags != callee_opts->x_target_flags) > > + ret = false; > > > > - /* Callee's isa options should be a subset of the caller's, i.e. a > > SSE4 > > - function can inline a SSE2 function but a SSE2 function can't > > inline > > - a SSE4 function. */ > > - if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) > > - != callee_opts->x_ix86_isa_flags) > > - || ((caller_opts->x_ix86_isa_flags2 & > > callee_opts->x_ix86_isa_flags2) > > - != callee_opts->x_ix86_isa_flags2)) > > - ret = false; > > - > > - /* See if we have the same non-isa options. */ > > - else if (caller_opts->x_target_flags != callee_opts->x_target_flags) > > - ret = false; > > - > > - /* See if arch, tune, etc. are the same. */ > > - else if (caller_opts->arch != callee_opts->arch) > > - ret = false; > > - > > - else if (caller_opts->tune != callee_opts->tune) > > - ret = false; > > + /* See if arch, tune, etc. are the same. */ > > + else if (caller_opts->arch != callee_opts->arch) > > + ret = false; > > > > - else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath) > > - ret = false; > > + else if (caller_opts->tune != callee_opts->tune) > > + ret = false; > > > > - else if (caller_opts->branch_cost != callee_opts->branch_cost) > > - ret = false; > > + else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath) > > + ret = false; > > > > - else > > - ret = true; > > - } > > + else if (caller_opts->branch_cost != callee_opts->branch_cost) > > + ret = false; > > + > > + else > > + ret = true; > > > > return ret; > > } > > Index: gcc/testsuite/gcc.target/i386/pr81921.c > > =================================================================== > > --- gcc/testsuite/gcc.target/i386/pr81921.c (nonexistent) > > +++ gcc/testsuite/gcc.target/i386/pr81921.c (working copy) > > @@ -0,0 +1,14 @@ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target lto } */ > > +/* { dg-options "-flto -march=x86-64" } */ > > + > > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, > > __artificial__, target("sse2"))) > > +_mm_loadu_si128 (int const *__P) > > +{ > > + return *__P; > > +} > > + > > +void __attribute__((target("ssse3"))) foo (void *p) > > +{ > > + volatile int x = _mm_loadu_si128 (p); > > +} > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)