Help with detection of an invariant

2015-12-07 Thread Dominik Vogt
On S/390 the test case gcc.dg/loop-9.c currently fails:

  void f (double *a) 
  { 
int i; 
for (i = 0; i < 100; i++) 
  a[i] = 18.4242; 
  }

It seems to expect that moving 18.4242 to a register is moved out
of the loop, but on S/390 it isn't.  It turns out that
move_invariant_reg() is never called from move_invariants()
because the invariants vector is empty.  Now,
find_invariant_insn() checks the insn for invariants using
check_dependencies().

  (insn 29 28 30 3 (set (mem:DF (reg:DI 81 [ ivtmp.8 ]) [0 MEM[base: _15, 
offset: 0B]+0 S8 A64])
  (const_double:DF 1.842419990222931955941021442413330078125e+1 
[0x0.9364c2f837b4ap+5])) .../loop-9.c:9 918 {*movdf_64}
   (nil))

check_dependencies() comes across reg 81 first, decides that is
not an invariant and returns false so that find_invariant_insn()
never even looks at the constant.

Actually, the constant should be moved (from the literal pool) to
a floating point register (and actually is in the assembly
output), and that move could be moved out of the loop (it's not).
Where should I look for the root cause?

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



How to rewrite call targets (OpenACC bind clause)

2015-12-07 Thread Thomas Schwinge
Hi!

I have tried a few things, and got things somewhat working, but I'm not
satisfied with my results so far, so I'd like to ask for help.  OpenACC
specifics are not relevant to my question, which I'm thus formulating in
a very generic way.  (But find an illustrative example at the end of the
email.)

If attached to a function declaration X (using a function attribute,
basically), the OpenACC bind clause specifies that when compiling for an
offloading target, all calls to function X should be diverted to function
Y, and the body of function X be discarded.  X remains the call target
when compiling for the host.  Y may be different per offloading target.
In the generic case, Y will be identified with an assembler name.

The requirements mandate an implementation in the LTO front end (which is
the entry point for every offloading compiler), or later.  Is the LTO
front end the right place to do this?  After read_cgraph_and_symbols or
somewhere else?

As we're not going to use it in the offloaded code (it's unreachable), my
first thought was: for all decls (X) that have a bind (Y) clause
attached, set the decl X's assembler name to Y's (using
symtab->change_decl_assembler_name -- or
gcc/varasm.c:set_user_assembler_name?).  That somewhat works, but Y will
then be compiled to X's name, and I saw problems if not only X's
declaration but also its definition were available, because we'd then get
two function definitions with X's (assembler) name, and I didn't manage
to discard only the original (unreachable) X definition while keeping its
decl alive (with assembler name Y), which is still used at all call
sites.  Maybe the wrong approach after all...

I'm able to look up cgraph_node::get_for_asmname([Y]), and I tried
experimenting with cgraph_node::create_alias and resolve_alias (in the
LTO front end) but that also hasn't been completely successful: this
worked if compiling with optimizations (Y even got inlined at the call
site of X, good!), but it didn't work with -O0.

I found the redirect_callee and redirect_call_stmt_to_callee functions of
cgraph_edge -- is that something I should be using?  (Still in the LTO
front end?)

Or, should I do this redirection after the LTO front end, in an early
pass (execute_oacc_device_lower?).  That is, for every
current_function_decl, locate all calls to all functions tagged with a
bind clause, and then rewrite the call sites to Y instead of X?


An illustrative example:

#pragma acc routine
int Y()
{
  return 2;
}

#pragma acc routine bind(Y)
int X()
{
  return 1;
}

int main()
{
  int ret;
#pragma acc parallel copyout(ret)
  ret = X();

  return ret;
}

If running with ACC_DEVICE_TYPE=host, this should return 1, and if
running with ACC_DEVICE_TYPE=not_host, it should return 2.


Grüße
 Thomas


Re: RFC: Intel386 psABI version 1.1 draft

2015-12-07 Thread H.J. Lu
On Tue, Nov 24, 2015 at 8:16 AM, H.J. Lu  wrote:
> Hi,
>
> Here is the Intel386 psABI version 1.1 draft:
>
> https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-20151120.pdf
>
> Main changes are
>
> 1. Add AVX-512 support.
> 2. Add linker optimization to combine GOTPLT and GOT slots.
> 3. Add R_386_GOT32X relocation and linker optimization.
> 4. Add FS/GS Base addresses to DWARF register number mapping.
> 5. Add Intel MPX support.
>
> MPX supported has been checked into GCC 5.  Linker optimization has
> been added to ld in binutils 2.25 and gold in binutils 2.26.  Gold and ld in
> binutils 2.26 supports new relocations.  Ld in binutils 2.26 can optimize
> new relocations.
>
> Any comments and feedbacks?
>

Here is the Intel386 psABI version 1.1:

https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-1.1.pdf

-- 
H.J.


Re: Help with detection of an invariant

2015-12-07 Thread Andrew Pinski
On Mon, Dec 7, 2015 at 6:44 AM, Dominik Vogt  wrote:
> On S/390 the test case gcc.dg/loop-9.c currently fails:
>
>   void f (double *a)
>   {
> int i;
> for (i = 0; i < 100; i++)
>   a[i] = 18.4242;
>   }
>
> It seems to expect that moving 18.4242 to a register is moved out
> of the loop, but on S/390 it isn't.  It turns out that
> move_invariant_reg() is never called from move_invariants()
> because the invariants vector is empty.  Now,
> find_invariant_insn() checks the insn for invariants using
> check_dependencies().
>
>   (insn 29 28 30 3 (set (mem:DF (reg:DI 81 [ ivtmp.8 ]) [0 MEM[base: _15, 
> offset: 0B]+0 S8 A64])
>   (const_double:DF 
> 1.842419990222931955941021442413330078125e+1 [0x0.9364c2f837b4ap+5])) 
> .../loop-9.c:9 918 {*movdf_64}
>(nil))
>
> check_dependencies() comes across reg 81 first, decides that is
> not an invariant and returns false so that find_invariant_insn()
> never even looks at the constant.
>
> Actually, the constant should be moved (from the literal pool) to
> a floating point register (and actually is in the assembly
> output), and that move could be moved out of the loop (it's not).
> Where should I look for the root cause?


Hmm,
  I want to say the predicates on movdf_64 are too lose allowing the
above when it should not.   That is movdf_64 should have pushed the
load of the fp constant into its psedu-register and used that to do
the storing.

Thanks,
Andrew

>
> Ciao
>
> Dominik ^_^  ^_^
>
> --
>
> Dominik Vogt
> IBM Germany
>


Re: Help with detection of an invariant

2015-12-07 Thread Dominik Vogt
On Mon, Dec 07, 2015 at 11:48:10AM -0800, Andrew Pinski wrote:
> On Mon, Dec 7, 2015 at 6:44 AM, Dominik Vogt  wrote:
> > On S/390 the test case gcc.dg/loop-9.c currently fails:
> >
> >   void f (double *a)
> >   {
> > int i;
> > for (i = 0; i < 100; i++)
> >   a[i] = 18.4242;
> >   }
> >
> > It seems to expect that moving 18.4242 to a register is moved out
> > of the loop, but on S/390 it isn't.  It turns out that
> > move_invariant_reg() is never called from move_invariants()
> > because the invariants vector is empty.  Now,
> > find_invariant_insn() checks the insn for invariants using
> > check_dependencies().
> >
> >   (insn 29 28 30 3 (set (mem:DF (reg:DI 81 [ ivtmp.8 ]) [0 MEM[base: _15, 
> > offset: 0B]+0 S8 A64])
> >   (const_double:DF 
> > 1.842419990222931955941021442413330078125e+1 
> > [0x0.9364c2f837b4ap+5])) .../loop-9.c:9 918 {*movdf_64}
> >(nil))
> >
> > check_dependencies() comes across reg 81 first, decides that is
> > not an invariant and returns false so that find_invariant_insn()
> > never even looks at the constant.
> >
> > Actually, the constant should be moved (from the literal pool) to
> > a floating point register (and actually is in the assembly
> > output), and that move could be moved out of the loop (it's not).
> > Where should I look for the root cause?
> 
> 
> Hmm,
>   I want to say the predicates on movdf_64 are too lose allowing the
> above when it should not.   That is movdf_64 should have pushed the
> load of the fp constant into its psedu-register and used that to do
> the storing.

Thanks!  It turns out that for historical S/390[x] moves such
constants to the literal pool only after the loop-invariants pass.
That is because on old cpus it was more efficient to do a direct
memory to memory move with the MVC instruction (i.e. moving the
constant to a register warly is harmful on old cpus).

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany