On Wed, 23 Aug 2017, Uros Bizjak wrote:

> On Wed, Aug 23, 2017 at 10:04 AM, Richard Biener <rguent...@suse.de> wrote:
> > On Wed, 23 Aug 2017, Richard Biener wrote:
> >
> >> On Tue, 22 Aug 2017, Richard Biener wrote:
> >>
> >> > On August 22, 2017 6:38:55 PM GMT+02:00, "Richard Earnshaw (lists)" 
> >> > <richard.earns...@arm.com> wrote:
> >> > >On 22/08/17 13:55, Kyrill Tkachov wrote:
> >> > >> Hi Richard,
> >> > >> [roping in more aarch64 maintainers]
> >> > >>
> >> > >> On 22/08/17 13:27, Richard Biener wrote:
> >> > >>> On Tue, 22 Aug 2017, Uros Bizjak wrote:
> >> > >>>
> >> > >>>> On Tue, Aug 22, 2017 at 12:15 PM, Richard Biener
> >> > ><rguent...@suse.de>
> >> > >>>> wrote:
> >> > >>>>> The following patch fixes PR81921 (and LTO build of libgo) which I
> >> > >ran
> >> > >>>>> into when trying to enable free-lang-data for non-LTO compiles.
> >> > >>>>>
> >> > >>>>> free-lang-data forces a DECL_FUNCTION_SPECIFIC_TARGET for all
> >> > >functions
> >> > >>>>> so we have them ending up with target_option_default_node
> >> > >eventually
> >> > >>>>> which is something ix86_can_inline_p doesn't expect (I tried
> >> > >forcing
> >> > >>>>> a compare of the actual options but that fails as well as we get
> >> > >>>>> spurious differences in use-fpmath, the default node with -m32
> >> > >>>>> -march=x86_64 doesn't have it while non-default nodes have it).
> >> > >>>>>
> >> > >>>>> The patch is what I consider safe for branches, we might want to
> >> > >work
> >> > >>>>> on sth better (actually comparing always and fixing the fpmath
> >> > >issue)
> >> > >>>>> on trunk as followup.
> >> > >>>>>
> >> > >>>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for
> >> > >trunk
> >> > >>>>> and active branches?
> >> > >>>>>
> >> > >>>>> Note the change to the ref = false conditional isn't strictly
> >> > >necessary
> >> > >>>>> but it makes -flto and non-flto behave consistently.
> >
> > ...
> >
> >> > >> The aarch64 changes looks ok to me FWIW (since I wrote that function)
> >> > >>
> >> > >
> >> > >I *think* this is probably OK for AArch64, but I don't think it's safe
> >> > >in general.
> >> > >
> >> > >Consider, for example, the arm port where we can have functions built
> >> > >for ARM and functions built for Thumb.  You cannot, in general, inline
> >> > >either into the other, (think inline asm, or isa-specific intrinsic
> >> > >operations).
> >> > >
> >> > >I think this approach only works at all because it's generally
> >> > >meaningless to have some ISA that is the default and then construct
> >> > >other functions later in the compilation that deliberately target a
> >> > >smaller subset of the ISA.  But such an assumption doesn't apply when
> >> > >you construct functions in completely separate ISAs which are
> >> > >call-compatible but otherwise distinct.
> >> >
> >> > Note the patch doesn't change anything for NON-LTO, where you never see
> >> > target_option_default_node. Just LTO forces those in place of NULL to
> >> > make cross-TU behavior more like you suggest.
> >>
> >> So thinking about this some more we lack the ability to distinguish
> >> between no explicit target attribute and an explicit target attribute
> >> that matches the global defaults with LTO.  Without LTO it is the
> >> target that decides whether to put target_option_default_node into
> >> DECL_FUNCTION_SPECIFIC_TARGET but LTO treats the global options
> >> as explicit target attribute for all functions of a TU.
> >>
> >> That said, the
> >>
> >>   /* If callee has no option attributes, then it is ok to inline */
> >>   if (!callee_opts)
> >>     ret = true;
> >>
> >> tries to allow inlining of functions with no explicit attribute
> >> into any other function which IMHO is reasonable?
> >>
> >> That argues for sth like DECL_FUNCTION_SPECIFIC_TARGET_USER_P and
> >> DECL_FUNCTION_SPECIFIC_OPTIMIZATION_USER_P.
> >>
> >> Does that make any sense?  The question is whether, in LTO context,
> >> differing global options are USER_P or not ...
> >>
> >> OTOH as you say usually function specific opts are a superset of
> >> global opts which means if the target properly checks two option
> >> sets against each other the effect should be similar.
> >
> > Which means we can also do the following (only the default and the x86
> > hook changed), always compare and properly treat a not present
> > DECL_FUNCTION_SPECIFIC_TARGET as target_option_default_node.
> >
> > To fix PR81921 we need to make sure to process target("sse2") when
> > it might change fpmath use (for -m32).  Not sure if that captures
> > all cases so I wonder whether skipping the processing is not
> > premature optimization -- we should do this at most once per function?
> >
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu, ok for trunk
> > for the x86 part?  Is the ix86_valid_target_attribute_tree change ok
> > for branches?
> >
> > Note that as there is the chance this now rejects cases that it didn't
> > before even w/o -flto such change isn't appropriate for the branches
> > (the ix86_valid_target_attribute_tree change is).  Note I arrived
> > here because enabling free-lang-data unconditionally broke bootstrap,
> > so the changes to the other archs where a similar thing may happen --
> > you can check with
> >
> > Index: gcc/tree.c
> > ===================================================================
> > --- gcc/tree.c  (revision 251274)
> > +++ gcc/tree.c  (working copy)
> > @@ -5660,8 +5660,7 @@ free_lang_data (void)
> >    unsigned i;
> >
> >    /* If we are the LTO frontend we have freed lang-specific data already.
> > */
> > -  if (in_lto_p
> > -      || (!flag_generate_lto && !flag_generate_offload))
> > +  if (in_lto_p)
> >      return 0;
> >
> >    /* Provide a dummy TRANSLATION_UNIT_DECL if the FE failed to provide
> > one.  */
> >
> > Richard, can you add a testcase where it is invalid to inline
> > -mthumb/arm code into the other?  I suppose that requires some
> > inline asm to break.  Or is there already one?
> >
> > Thanks,
> > Richard.
> >
> > 2017-08-23  Richard Biener  <rguent...@suse.de>
> >
> >         PR target/81921
> >         * targhooks.c (default_target_can_inline_p): Properly
> >         use target_option_default_node when no DECL_FUNCTION_SPECIFIC_TARGET
> >         is present and always compare.
> >         * config/i386/i386.c (ix86_valid_target_attribute_tree): Make
> >         sure to process the attribute when fpmath may change.
> >         (ix86_can_inline_p): Properly use target_option_default_node when
> >         no DECL_FUNCTION_SPECIFIC_TARGET is present and always compare.
> >
> > Index: gcc/targhooks.c
> > ===================================================================
> > --- gcc/targhooks.c     (revision 251275)
> > +++ gcc/targhooks.c     (working copy)
> > @@ -1442,27 +1442,18 @@ default_target_option_pragma_parse (tree
> >  bool
> >  default_target_can_inline_p (tree caller, tree callee)
> >  {
> > -  bool ret = false;
> >    tree callee_opts = DECL_FUNCTION_SPECIFIC_TARGET (callee);
> >    tree caller_opts = DECL_FUNCTION_SPECIFIC_TARGET (caller);
> > -
> > -  /* If callee has no option attributes, then it is ok to inline */
> > -  if (!callee_opts)
> > -    ret = true;
> > -
> > -  /* If caller has no option attributes, but callee does then it is not ok 
> > to
> > -     inline */
> > -  else if (!caller_opts)
> > -    ret = false;
> > +  if (! callee_opts)
> > +    callee_opts = target_option_default_node;
> > +  if (! caller_opts)
> > +    caller_opts = target_option_default_node;
> >
> >    /* If both caller and callee have attributes, assume that if the
> >       pointer is different, the two functions have different target
> >       options since build_target_option_node uses a hash table for the
> >       options.  */
> > -  else
> > -    ret = (callee_opts == caller_opts);
> > -
> > -  return ret;
> > +  return callee_opts == caller_opts;
> >  }
> >
> >  /* If the machine does not have a case insn that compares the bounds,
> > Index: gcc/config/i386/i386.c
> > ===================================================================
> > --- gcc/config/i386/i386.c      (revision 251275)
> > +++ gcc/config/i386/i386.c      (working copy)
> > @@ -7369,7 +7369,8 @@ ix86_valid_target_attribute_tree (tree a
> >        || opts->x_target_flags != def->x_target_flags
> >        || option_strings[IX86_FUNCTION_SPECIFIC_ARCH]
> >        || option_strings[IX86_FUNCTION_SPECIFIC_TUNE]
> > -      || enum_opts_set.x_ix86_fpmath)
> > +      || enum_opts_set.x_ix86_fpmath
> > +      || !TARGET_64BIT_P (opts->x_ix86_isa_flags))
> 
> You should use (!TARGET_64BIT_P ... && TARGET_SSE_P ...) to match the
> condition in the special fpmath processing part. (BTW: The comment in
> that part is wrong, we need SSE, not sse2 on 32-bit targets to use SSE
> math.)

Hmpf, with the change I again run into the libgo bootstrap fail
(appearantly my reduced testcase is incomplete)

libtool: compile:  /abuild/rguenther/obj2/./gcc/xgcc 
-B/abuild/rguenther/obj2/./gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ 
-B/usr/local/x86_64-pc-linux-gnu/lib/ -isystem 
/usr/local/x86_64-pc-linux-gnu/include -isystem 
/usr/local/x86_64-pc-linux-gnu/sys-include -m32 -DHAVE_CONFIG_H -I. 
-I/space/rguenther/src/svn/trunk2/libgo -I 
/space/rguenther/src/svn/trunk2/libgo/runtime 
-I/space/rguenther/src/svn/trunk2/libgo/../libffi/include 
-I../libffi/include -pthread -fexceptions -fnon-call-exceptions 
-fplan9-extensions -fsplit-stack -Wall -Wextra -Wwrite-strings -Wcast-qual 
-Werror -minline-all-stringops -D_GNU_SOURCE -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64 -I /space/rguenther/src/svn/trunk2/libgo/../libgcc 
-I /space/rguenther/src/svn/trunk2/libgo/../libbacktrace -I 
../../../gcc/include -g -O2 -m32 -MT aeshash.lo -MD -MP -MF 
.deps/aeshash.Tpo -c 
/space/rguenther/src/svn/trunk2/libgo/runtime/aeshash.c  -fPIC -DPIC -o 
.libs/aeshash.o
In file included from 
/space/rguenther/src/svn/trunk2/libgo/runtime/aeshash.c:17:0:
/space/rguenther/src/svn/trunk2/libgo/runtime/aeshash.c: In function 
‘aeshashbody’:
/abuild/rguenther/obj2/./gcc/include/emmintrin.h:700:1: error: inlining 
failed in call to always_inline ‘_mm_loadu_si128’: target specific option 
mismatch
 _mm_loadu_si128 (__m128i_u const *__P)
 ^~~~~~~~~~~~~~~
/space/rguenther/src/svn/trunk2/libgo/runtime/aeshash.c:387:11: note: 
called from here
  mseed ^= _mm_loadu_si128(aeskeysched.__values);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

that appens even with making the if a if (1).  The check we run into
is somehow still

  else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath)
    ret = false;

ah ...

#ifndef __SSE__
#pragma GCC push_options
#pragma GCC target("sse")
#define __DISABLE_SSE__
#endif /* __SSE__ */

but on x86_64 when building with -m32 we _do_ have __SSE__ set so
the intrinsics get target_option_default_node which doesn't have
fmpath=sse.  So x86_64 -m32 -march=i586 enables SSE math for
the intrinsics while x86_64 -m32 [-march=x86-64] does not...

Which means we have to fix the intrinsic headers to unconditionally
push "sse" or relax the comparison and allow mismatching fpmath?

Richard.

> Other x86 changes LGTM.
> 
> Uros.
> 
> >      {
> >        /* If we are using the default tune= or arch=, undo the string 
> > assigned,
> >          and use the default.  */
> > @@ -7502,53 +7503,47 @@ ix86_valid_target_attribute_p (tree fnde
> >  static bool
> >  ix86_can_inline_p (tree caller, tree callee)
> >  {
> > -  bool ret = false;
> >    tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
> >    tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
> > -
> > -  /* If callee has no option attributes, then it is ok to inline.  */
> >    if (!callee_tree)
> > -    ret = true;
> > +    callee_tree = target_option_default_node;
> > +  if (!caller_tree)
> > +    caller_tree = target_option_default_node;
> > +  if (callee_tree == caller_tree)
> > +    return true;
> >
> > -  /* If caller has no option attributes, but callee does then it is not ok 
> > to
> > -     inline.  */
> > -  else if (!caller_tree)
> > +  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
> > +  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
> > +  bool ret = false;
> > +
> > +  /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
> > +     function can inline a SSE2 function but a SSE2 function can't inline
> > +     a SSE4 function.  */
> > +  if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
> > +       != callee_opts->x_ix86_isa_flags)
> > +      || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
> > +         != callee_opts->x_ix86_isa_flags2))
> >      ret = false;
> >
> > -  else
> > -    {
> > -      struct cl_target_option *caller_opts = TREE_TARGET_OPTION 
> > (caller_tree);
> > -      struct cl_target_option *callee_opts = TREE_TARGET_OPTION 
> > (callee_tree);
> > +  /* See if we have the same non-isa options.  */
> > +  else if (caller_opts->x_target_flags != callee_opts->x_target_flags)
> > +    ret = false;
> >
> > -      /* Callee's isa options should be a subset of the caller's, i.e. a 
> > SSE4
> > -        function can inline a SSE2 function but a SSE2 function can't 
> > inline
> > -        a SSE4 function.  */
> > -      if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
> > -          != callee_opts->x_ix86_isa_flags)
> > -         || ((caller_opts->x_ix86_isa_flags2 & 
> > callee_opts->x_ix86_isa_flags2)
> > -             != callee_opts->x_ix86_isa_flags2))
> > -       ret = false;
> > -
> > -      /* See if we have the same non-isa options.  */
> > -      else if (caller_opts->x_target_flags != callee_opts->x_target_flags)
> > -       ret = false;
> > -
> > -      /* See if arch, tune, etc. are the same.  */
> > -      else if (caller_opts->arch != callee_opts->arch)
> > -       ret = false;
> > -
> > -      else if (caller_opts->tune != callee_opts->tune)
> > -       ret = false;
> > +  /* See if arch, tune, etc. are the same.  */
> > +  else if (caller_opts->arch != callee_opts->arch)
> > +    ret = false;
> >
> > -      else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath)
> > -       ret = false;
> > +  else if (caller_opts->tune != callee_opts->tune)
> > +    ret = false;
> >
> > -      else if (caller_opts->branch_cost != callee_opts->branch_cost)
> > -       ret = false;
> > +  else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath)
> > +    ret = false;
> >
> > -      else
> > -       ret = true;
> > -    }
> > +  else if (caller_opts->branch_cost != callee_opts->branch_cost)
> > +    ret = false;
> > +
> > +  else
> > +    ret = true;
> >
> >    return ret;
> >  }
> > Index: gcc/testsuite/gcc.target/i386/pr81921.c
> > ===================================================================
> > --- gcc/testsuite/gcc.target/i386/pr81921.c     (nonexistent)
> > +++ gcc/testsuite/gcc.target/i386/pr81921.c     (working copy)
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target lto } */
> > +/* { dg-options "-flto -march=x86-64" } */
> > +
> > +extern __inline int  __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__, target("sse2")))
> > +_mm_loadu_si128 (int const *__P)
> > +{
> > +    return *__P;
> > +}
> > +
> > +void __attribute__((target("ssse3"))) foo (void *p)
> > +{
> > +  volatile int x = _mm_loadu_si128 (p);
> > +}
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to