On Tue, Oct 15, 2019 at 2:18 AM Peter Bergner <[email protected]> wrote:
>
> On 10/14/19 2:57 PM, Segher Boessenkool wrote:
> > On Mon, Oct 14, 2019 at 06:35:06PM +0200, Richard Biener wrote:
> >> The general case should be that if the caller ISA supports the callee one
> >> then inlining is OK. If this is not wanted in some cases then there are
> >> options like using a noinline attribute.
> >
> > I agree, and that is what we already do afaik.
>
> I agree on the making sure the caller's ISA supports the callee's ISA
> before allowing inlining...and I think that's what our code it "trying"
> to do. It just has some bugs.
>
>
> > But in this case, the callee explicitly disables something (-mno-vsx),
> > while the caller has it enabled (all modern cpus have VSX). If it ends
> > up being inlined, it will get VSX insns generated for it. Which is what
> > Jiu Fu's patch aims to prevent.
> >
> > So you are saying the GCC policy is that you should use noinline on the
> > callee in such cases?
>
> I don't think sprinkling noinline's around will work given LTO can
> inline random functions from one object file into a function in another
> object file and we have no idea what the options were used for both files.
> I think that would just force us to end up putting nolines on all fuctions
> that might be compiled with LTO.
There's a function attribute for this.
> I think we just need to fix the bug in the current logic when checking
> whether the caller's ISA flags supports the callee's ISA flags. ...and
> for that, I think we just need to add a test that enforces that the
> caller's ISA flags match exactly the callee's flags, for those flags
> that were explicitly set in the callee. The patch below seems to fix
> the issue (regtesting now). Does this look like what we want?
I believe this is going to bite you exactly in the case you want the
opposite behavior. If you have CUs compiled with defaults and
a specialized one with VSX that calls into generic compiled functions
you _do_ want to allow inlining into the VSX enabled routines. Just
think of LTO, C++ and comdats - you'll get a random comdat entity
at link time for inlining - either from the VSX CU or the non-VSX one.
Richard.
> Peter
>
>
> gcc/
> * config/rs6000/rs6000.c (rs6000_can_inline_p): Handle explicit
> options.
>
> gcc.testsuite/
> * gcc.target/powerpc/pr70010.c: New test.
>
>
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c (revision 276975)
> +++ gcc/config/rs6000/rs6000.c (working copy)
> @@ -23976,13 +23976,18 @@ rs6000_can_inline_p (tree caller, tree c
> else
> {
> struct cl_target_option *caller_opts = TREE_TARGET_OPTION
> (caller_tree);
> + HOST_WIDE_INT caller_isa = caller_opts->x_rs6000_isa_flags;
> struct cl_target_option *callee_opts = TREE_TARGET_OPTION
> (callee_tree);
> + HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
> + HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
>
> - /* Callee's options should a subset of the caller's, i.e. a vsx
> function
> - can inline an altivec function but a non-vsx function can't inline a
> - vsx function. */
> - if ((caller_opts->x_rs6000_isa_flags & callee_opts->x_rs6000_isa_flags)
> - == callee_opts->x_rs6000_isa_flags)
> + /* The callee's options must be a subset of the caller's options, i.e.
> + a vsx function may inline an altivec function, but a non-vsx function
> + must not inline a vsx function. However, for those options that the
> + callee has explicitly set, then we must enforce that the callee's
> + and caller's options match exactly; see PR70010. */
> + if (((caller_isa & callee_isa) == callee_isa)
> + && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
> ret = true;
> }
>
> Index: gcc/testsuite/gcc.target/powerpc/pr70010.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/pr70010.c (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr70010.c (working copy)
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -finline" } */
> +/* { dg-final { scan-assembler "bl vadd_no_vsx" } } */
> +
> +typedef int vec_t __attribute__((vector_size(16)));
> +
> +static vec_t
> +__attribute__((__target__("no-vsx")))
> +vadd_no_vsx (vec_t a, vec_t b)
> +{
> + return a + b;
> +}
> +
> +vec_t
> +__attribute__((__target__("vsx")))
> +call_vadd_no_vsx (vec_t x, vec_t y, vec_t z)
> +{
> + return vadd_no_vsx (x, y) - z;
> +}
>
>
>
> bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.i
> typedef int vec_t __attribute__((vector_size(16)));
>
> static vec_t
> __attribute__((__target__("no-vsx")))
> vadd_no_vsx (vec_t a, vec_t b)
> {
> return a + b;
> }
>
> vec_t
> __attribute__((__target__("vsx")))
> call_vadd_no_vsx (vec_t x, vec_t y, vec_t z)
> {
> return vadd_no_vsx (x, y) - z;
> }
>
> bergner@pike:~/gcc/BUGS/CPU$ old-gcc -O2 -finline -S pr70010.i
> bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.s
> call_vadd_no_vsx:
> vadduwm 2,2,3
> vsubuwm 2,2,4
> blr
>
> bergner@pike:~/gcc/BUGS/CPU$
> /home/bergner/gcc/build/gcc-fsf-mainline-cpu-flags-debug/gcc/xgcc
> -B/home/bergner/gcc/build/gcc-fsf-mainline-cpu-flags-debug/gcc -O2 -finline
> -S pr70010.i
> bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.s
> vadd_no_vsx:
> vadduwm 2,2,3
> blr
> .long 0
>
> call_vadd_no_vsx:
> 0: addis 2,12,.TOC.-.LCF1@ha
> addi 2,2,.TOC.-.LCF1@l
> mflr 0
> std 0,16(1)
> stdu 1,-48(1)
> li 0,32
> stvx 31,1,0
> xxlor 63,36,36
> bl vadd_no_vsx
> addi 1,1,48
> li 0,-16
> vsubuwm 2,2,31
> lvx 31,1,0
> ld 0,16(1)
> mtlr 0
> blr