Re: [PATCH][gomp4] Plugins Support in LibGOMP (Take 2)

2013-09-23 Thread Ilya Verbin
On 23 Sep 12:26, Jakub Jelinek wrote:
> On Thu, Sep 19, 2013 at 08:09:04PM +0400, Michael V. Zolotukhin wrote:
> > Hi Jakub,
> > 
> > Updated patch and my answers are below.
> 
> Ok for gomp-4_0-branch.

Checked into gomp-4_0-branch by Kirill Yukhin:
http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00692.html

> 
>   Jakub

  -- Ilya


Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.

2013-09-25 Thread Ilya Verbin
On 24 Sep 10:04, Richard Henderson wrote:
> On 08/27/2013 11:37 AM, Kirill Yukhin wrote:
> > Hello,
> > 
> >> This patch is still far too large.
> >>
> >> I think you should split it up based on every single mode iterator that
> >> you need to add or change.
> > 
> > Problem is that some iterators are depend on each other, so patches are
> > not going to be tiny.
> > 
> > Here is 1st one. It extends VF iterator - biggest impact I believe
> > 
> > Is it Ok?
> > 
> > Testing:
> >   1. Bootstrap pass.
> >   2. make check shows no regressions.
> >   3. Spec 2000 & 2006 build show no regressions both with and without 
> > -mavx512f option.
> >   4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f 
> > option.
> 
> 
> Ok.
> 
> 
> r~

Checked into main trunk by Kirill Yukhin:
http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00779.html

  -- Ilya


Re: [gomp4] Tweak GOMP_target{,_data,_update} arguments

2013-09-26 Thread Ilya Verbin
On 19 Sep 11:23, Jakub Jelinek wrote:
> that.  Another complication is dependent shared libraries.
> Consider
> liba.c:
> #pragma omp declare target
> int i;
> int foo (void)
> {
>   return ++i;
> }
> #pragma omp end declare target
> main.c:
> #pragma omp declare target
> extern int i;
> extern int foo (void);
> #pragma omp end declare target
> int main ()
> {
>   int j;
>   #pragma omp target
> {
>   j = i;
>   j += foo ();
> }
>   if (j != 1)
> abort ();
>   return 0;
> }
> gcc -shared -O2 -fpic -fopenmp -o liba.so -Wl,-soname,liba.so liba.c
> gcc -O2 -fopenmp -o main main.c -L. -la
> ./main
> 
> Perhaps the linker plugin can extract the target shared libraries from
> the embedded sections of dependent shared libraries (if any), and link the
> "main" shared library against that, but GOMP_target will need to know that
> it can't just offload main.so, but also has to offload the dependent
> liba.so (and of course libgomp.so.1 from the libgomp plugin).
> What does ICC do in this case?
> 
>   Jakub

Hi Jakub,

Here's what ICC does.
Suppose we have liba.c and main.c, both with target regions:

1. Building liba.c -> liba.so.
A call to offload-runtime library is inserted into _init of liba.so.
Target region is compiled into liba_target.so, and placed into .rodata of
liba.so.

2. Building main.c -> main.exe.
Similarly, a call to offload-runtime library is inserted into _init of main.exe.
Target region is compiled into main_target.so, and placed into .rodata of
main.exe.

3. Runtime.
So, when liba.so and main.exe are loaded at host-side, the runtime library
knows, that it should transfer liba_target.so and main_target.so to the
target-side.  Then, main.exe starts execution.  At every entry point to the
target region, runtime library checks whether it should perform an
initialization.  If target is not initialized, runtime library calls
COIProcessCreateFromMemory(main_target.exe), that transfers some standard
main_target.exe to the target and starts it.  Then, runtime library calls
COIProcessLoadLibraryFromMemory(liba_target.so, main_target.so), that transfers
these libraries to the target and loads them into the main_target.exe.
The target-side functions are called from host through
COIProcessGetFunctionHandles("f_name") and COIPipelineRunFunction(handle). The
addresses of target-side functions are obtained from *_target.so by dlsym().
So, the host-side knows nothing about target addresses.

What do you think, how will such an approach work with other target
architectures, and with current implementation of GOMP_target{,_data,_update}?

Thanks,
  -- Ilya


Re: [hsa merge 02/10] Modifications to libgomp proper

2016-01-20 Thread Ilya Verbin
On Wed, Jan 13, 2016 at 18:39:27 +0100, Martin Jambor wrote:
> diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
> b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> index 68f7b2c..58ef595 100644
> --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> @@ -528,7 +528,7 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst_ptr, const 
> void *src_ptr,
>  
>  extern "C" void
>  GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
> - void *async_data)
> + void **, void *async_data)
>  {
>TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p, async_data = %p)", 
> device,
>tgt_fn, tgt_vars, async_data);
> @@ -544,7 +544,7 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void 
> *tgt_vars,
>  }
>  
>  extern "C" void
> -GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars)
> +GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars, void **)
>  {
>TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p)", device, tgt_fn, 
> tgt_vars);

This breaks GOMP_OFFLOAD_run.  Committed as obvious.


2016-01-20  Ilya Verbin  

liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_run): Pass extra NULL
to GOMP_OFFLOAD_async_run.


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 58ef595..57accb4 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -548,5 +548,5 @@ GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars, 
void **)
 {
   TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p)", device, tgt_fn, 
tgt_vars);
 
-  GOMP_OFFLOAD_async_run (device, tgt_fn, tgt_vars, NULL);
+  GOMP_OFFLOAD_async_run (device, tgt_fn, tgt_vars, NULL, NULL);
 }


  -- Ilya


Re: [hsa merge 02/10] Modifications to libgomp proper

2016-01-20 Thread Ilya Verbin
On Wed, Jan 13, 2016 at 18:39:27 +0100, Martin Jambor wrote:
>   * task.c (GOMP_PLUGIN_target_task_completion): Free
>   firstprivate_copies.

Also this change caused 3 fails on intelmicemul:

FAIL: libgomp.c/target-32.c execution test
FAIL: libgomp.c/target-33.c execution test
FAIL: libgomp.c/target-34.c execution test

Because ttask->firstprivate_copies is uninitialized for 
!GOMP_OFFLOAD_CAP_SHARED_MEM.

(gdb) p ttask->firstprivate_copies
$1 = (void *) 0x1
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x003b076800dc in free () from /lib64/libc.so.6
(gdb) bt
#0  0x003b076800dc in free () from /lib64/libc.so.6
#1  0x77dda871 in GOMP_PLUGIN_target_task_completion (data=0x624ac0) at 
gcc/libgomp/task.c:585
[...]


OK for trunk?

libgomp/
* task.c (gomp_create_target_task): Set firstprivate_copies to NULL.

diff --git a/libgomp/task.c b/libgomp/task.c
index 0f45c44..38d4e9b 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -683,6 +683,7 @@ gomp_create_target_task (struct gomp_device_descr *devicep,
   ttask->state = state;
   ttask->task = task;
   ttask->team = team;
+  ttask->firstprivate_copies = NULL;
   task->fn = NULL;
   task->fn_data = ttask;
   task->final_task = 0;

  -- Ilya


Re: [hsa merge 07/10] IPA-HSA pass

2016-01-20 Thread Ilya Verbin
On Fri, Jan 15, 2016 at 21:05:47 +0300, Ilya Verbin wrote:
> On Fri, Jan 15, 2016 at 17:45:22 +0100, Jakub Jelinek wrote:
> > On Fri, Jan 15, 2016 at 07:38:14PM +0300, Ilya Verbin wrote:
> > > On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> > > > On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > > > > How do other accelerators cope with the situation when half of the
> > > > > application is compiled with the accelerator disabled?  (Would some of
> > > > > their calls to GOMP_target_ext lead to abort?)
> > > > 
> > > > GOMP_target_ext should never abort (unless internal error), worst case 
> > > > it
> > > > just falls back into the host fallback.
> > > 
> > > Wouldn't that lead to hard-to-find problems in case of nonshared memory?
> > > I mean when someone expects that all target regions are executed on the 
> > > device,
> > > but in fact some of them are silently executed on the host with different 
> > > data
> > > environment.
> > 
> > E.g. for HSA it really shouldn't matter, as it is shared memory accelerator.
> > For XeonPhi we hopefully can offload anything.
> 
> As you said, if compilation of target image fails with ICE or somehow, host
> fallback and offloading to other targets should still work:
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00951.html
> That patch was not applied, but it can be simulated by -foffload=disable,

I agree that OpenMP doesn't guarantee that all target regions must be executed
on the device, but in this case a user can't be sure that some library function
always will offload (because the library might be replaced by fallback version),
and he/she will have to write something like:

{
  map_data_to_target ();
  some_library1_fn_with_offload ();
  get_data_from_target ();   /* ! */
  send_data_to_target ();/* ! */
  some_library2_fn_with_offload ();
  get_data_from_target ();   /* ! */
  send_data_to_target ();/* ! */
  some_library3_fn_with_offload ();
  unmap_data_from_target ();
}

If you're OK with this, I'll install this patch:


libgomp/
* target.c (gomp_get_target_fn_addr): Allow host fallback if target
function wasn't mapped to the device with non-shared memory.

diff --git a/libgomp/target.c b/libgomp/target.c
index f1f5849..96fe3d5 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1436,12 +1436,7 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
*devicep,
   splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
   gomp_mutex_unlock (&devicep->lock);
   if (tgt_fn == NULL)
-   {
- if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
-   return NULL;
- else
-   gomp_fatal ("Target function wasn't mapped");
-   }
+   return NULL;
 
   return (void *) tgt_fn->tgt_offset;
 }

  -- Ilya


Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables

2016-01-25 Thread Ilya Verbin
Hi!

On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:
> >diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> >index 62e5454..cdaee41 100644
> >--- a/gcc/lto-cgraph.c
> >+++ b/gcc/lto-cgraph.c
> >@@ -1911,6 +1911,11 @@ input_offload_tables (void)
> >   tree fn_decl
> > = lto_file_decl_data_get_fn_decl (file_data, decl_index);
> >   vec_safe_push (offload_funcs, fn_decl);
> >+
> >+  /* Prevent IPA from removing fn_decl as unreachable, since there
> >+ may be no refs from the parent function to child_fn in offload
> >+ LTO mode.  */
> >+  cgraph_node::get (fn_decl)->mark_force_output ();
> > }
> >   else if (tag == LTO_symtab_variable)
> > {
> >@@ -1918,6 +1923,10 @@ input_offload_tables (void)
> >   tree var_decl
> > = lto_file_decl_data_get_var_decl (file_data, decl_index);
> >   vec_safe_push (offload_vars, var_decl);
> >+
> >+  /* Prevent IPA from removing var_decl as unused, since there
> >+ may be no refs to var_decl in offload LTO mode.  */
> >+  varpool_node::get (var_decl)->force_output = 1;
> > }

This doesn't work when there is more than one LTO partition, because only first
partition contains full offload table to maintain correct order, but cgraph and
varpool nodes aren't necessarily created for the first partition.  To reproduce:

$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* --target_board=unix/-flto"
FAIL: libgomp.c/for-3.c (internal compiler error)
FAIL: libgomp.c/for-5.c (internal compiler error)
FAIL: libgomp.c/for-6.c (internal compiler error)
$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* 
--target_board=unix/-flto"
FAIL: libgomp.c++/for-11.C (internal compiler error)
FAIL: libgomp.c++/for-13.C (internal compiler error)
FAIL: libgomp.c++/for-14.C (internal compiler error)

  -- Ilya


Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables

2016-01-26 Thread Ilya Verbin
On Tue, Jan 26, 2016 at 13:21:57 +0100, Tom de Vries wrote:
> On 25/01/16 14:27, Ilya Verbin wrote:
> >On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:
> >>>diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> >>>index 62e5454..cdaee41 100644
> >>>--- a/gcc/lto-cgraph.c
> >>>+++ b/gcc/lto-cgraph.c
> >>>@@ -1911,6 +1911,11 @@ input_offload_tables (void)
> >>> tree fn_decl
> >>>   = lto_file_decl_data_get_fn_decl (file_data, decl_index);
> >>> vec_safe_push (offload_funcs, fn_decl);
> >>>+
> >>>+/* Prevent IPA from removing fn_decl as unreachable, since there
> >>>+   may be no refs from the parent function to child_fn in offload
> >>>+   LTO mode.  */
> >>>+cgraph_node::get (fn_decl)->mark_force_output ();
> >>>   }
> >>> else if (tag == LTO_symtab_variable)
> >>>   {
> >>>@@ -1918,6 +1923,10 @@ input_offload_tables (void)
> >>> tree var_decl
> >>>   = lto_file_decl_data_get_var_decl (file_data, decl_index);
> >>> vec_safe_push (offload_vars, var_decl);
> >>>+
> >>>+/* Prevent IPA from removing var_decl as unused, since there
> >>>+   may be no refs to var_decl in offload LTO mode.  */
> >>>+varpool_node::get (var_decl)->force_output = 1;
> >>>   }
> >
> >This doesn't work when there is more than one LTO partition, because only 
> >first
> >partition contains full offload table to maintain correct order, but cgraph 
> >and
> >varpool nodes aren't necessarily created for the first partition.  To 
> >reproduce:
> >
> >$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* 
> >--target_board=unix/-flto"
> >FAIL: libgomp.c/for-3.c (internal compiler error)
> >FAIL: libgomp.c/for-5.c (internal compiler error)
> >FAIL: libgomp.c/for-6.c (internal compiler error)
> >$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* 
> >--target_board=unix/-flto"
> >FAIL: libgomp.c++/for-11.C (internal compiler error)
> >FAIL: libgomp.c++/for-13.C (internal compiler error)
> >FAIL: libgomp.c++/for-14.C (internal compiler error)
> 
> This works for me.
> 
> OK for trunk?
> 
> Thanks,
> - Tom
> 

> Check that cgraph/varpool_node exists before use in input_offload_tables
> 
> 2016-01-26  Tom de Vries  
> 
>   * lto-cgraph.c (input_offload_tables): Check that cgraph/varpool_node
>   exists before use.

In this case they will be not marked as force_output in other partitions (except
the first one).

  -- Ilya


Re: [PATCH, PR69607] Mark offload symbols as global in lto

2016-02-08 Thread Ilya Verbin
On Mon, Feb 08, 2016 at 14:00:00 +0100, Tom de Vries wrote:
> when running libgomp.c testsuite with "-flto -flto-partition=1to1
> -fno-toplevel-reorder" we run into many compilation failures like this:
> ...
> /tmp/.ltrans0.ltrans.o:(.gnu.offload_funcs+0x1a0): undefined
> reference to `MAIN__._omp_fn.0'^M
> ...
> 
> The problem is that the offload table is in one lto partition, and the
> function listed in the offload table is in another, without the function
> having been promoted to be visible in the other partition.
> 
> The patch fixes this by promoting the symbols in the offload table such that
> they're visible in all partitions.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> Build for nvidia accelerator and reg-tested libgomp with various lto
> settings.

Works fine with intelmic offloading.

  -- Ilya


Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables

2016-02-08 Thread Ilya Verbin
On Mon, Feb 08, 2016 at 14:20:11 +0100, Tom de Vries wrote:
> On 26/01/16 14:01, Ilya Verbin wrote:
> >On Tue, Jan 26, 2016 at 13:21:57 +0100, Tom de Vries wrote:
> >>On 25/01/16 14:27, Ilya Verbin wrote:
> >>>On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:
> >>>>>diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> >>>>>index 62e5454..cdaee41 100644
> >>>>>--- a/gcc/lto-cgraph.c
> >>>>>+++ b/gcc/lto-cgraph.c
> >>>>>@@ -1911,6 +1911,11 @@ input_offload_tables (void)
> >>>>>   tree fn_decl
> >>>>> = lto_file_decl_data_get_fn_decl (file_data, 
> >>>>> decl_index);
> >>>>>   vec_safe_push (offload_funcs, fn_decl);
> >>>>>+
> >>>>>+  /* Prevent IPA from removing fn_decl as unreachable, 
> >>>>>since there
> >>>>>+ may be no refs from the parent function to child_fn in 
> >>>>>offload
> >>>>>+ LTO mode.  */
> >>>>>+  cgraph_node::get (fn_decl)->mark_force_output ();
> >>>>> }
> >>>>>   else if (tag == LTO_symtab_variable)
> >>>>> {
> >>>>>@@ -1918,6 +1923,10 @@ input_offload_tables (void)
> >>>>>   tree var_decl
> >>>>> = lto_file_decl_data_get_var_decl (file_data, 
> >>>>> decl_index);
> >>>>>   vec_safe_push (offload_vars, var_decl);
> >>>>>+
> >>>>>+  /* Prevent IPA from removing var_decl as unused, since 
> >>>>>there
> >>>>>+ may be no refs to var_decl in offload LTO mode.  */
> >>>>>+  varpool_node::get (var_decl)->force_output = 1;
> >>>>> }
> >>>
> >>>This doesn't work when there is more than one LTO partition, because only 
> >>>first
> >>>partition contains full offload table to maintain correct order, but 
> >>>cgraph and
> >>>varpool nodes aren't necessarily created for the first partition.  To 
> >>>reproduce:
> >>>
> >>>$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* 
> >>>--target_board=unix/-flto"
> >>>FAIL: libgomp.c/for-3.c (internal compiler error)
> >>>FAIL: libgomp.c/for-5.c (internal compiler error)
> >>>FAIL: libgomp.c/for-6.c (internal compiler error)
> >>>$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* 
> >>>--target_board=unix/-flto"
> >>>FAIL: libgomp.c++/for-11.C (internal compiler error)
> >>>FAIL: libgomp.c++/for-13.C (internal compiler error)
> >>>FAIL: libgomp.c++/for-14.C (internal compiler error)
> >>
> >>This works for me.
> >>
> >>OK for trunk?
> >>
> >>Thanks,
> >>- Tom
> >>
> >
> >>Check that cgraph/varpool_node exists before use in input_offload_tables
> >>
> >>2016-01-26  Tom de Vries  
> >>
> >>* lto-cgraph.c (input_offload_tables): Check that cgraph/varpool_node
> >>exists before use.
> >
> >In this case they will be not marked as force_output in other partitions 
> >(except
> >the first one).
> 
> AFAIU, that's not the case.
> 
> If we're splitting up lto compilation over partitions, it means we're first
> calling lto1 in WPA mode. We'll read in all offload tables, and mark all
> symbols with force_output, and when writing out the partitions, we'll write
> the offload symbols out with force_output set.
> 
> This updated patch only does the force_output marking for offload symbols in
> WPA or LTO. It's not necessary in LTRANS mode.

You're right, works for me.

  -- Ilya


Re: [PATCH][RFC][Offloading] Fix PR68463

2016-02-10 Thread Ilya Verbin
Hi!

On Tue, Jan 19, 2016 at 16:32:13 +0300, Ilya Verbin wrote:
> On Tue, Jan 19, 2016 at 10:36:28 +0100, Jakub Jelinek wrote:
> > On Tue, Jan 19, 2016 at 09:57:01AM +0100, Richard Biener wrote:
> > > On Mon, 18 Jan 2016, Ilya Verbin wrote:
> > > > On Fri, Jan 15, 2016 at 09:15:01 +0100, Richard Biener wrote:
> > > > > On Fri, 15 Jan 2016, Ilya Verbin wrote:
> > > > > > II) The __offload_func_table, __offload_funcs_end, 
> > > > > > __offload_var_table,
> > > > > > __offload_vars_end are now provided by the linker script, instead of
> > > > > > crtoffload{begin,end}.o, this allows to surround all offload 
> > > > > > objects, even
> > > > > > those that are not claimed by lto-plugin.
> > > > > > Unfortunately it works only with ld, but doen't work with gold, 
> > > > > > because
> > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=15373
> > > > > > Any thoughts how to enable this linker script for gold?
> > > > > 
> > > > > The easiest way would probably to add this handling to the default
> > > > > "linker script" in gold.  I don't see an easy way around requiring
> > > > > changes to gold here - maybe dumping the default linker script from
> > > > > bfd and injecting the rules with some scripting so you have a complete
> > > > > script.  Though likely gold won't grok that result.
> > > > > 
> > > > > Really a question for Ian though.
> > > > 
> > > > Or the gcc driver can add crtoffload{begin,end}.o, but the problem is 
> > > > that it
> > > > can't determine whether the program contains offloading or not.  So it 
> > > > can add
> > > > them to all -fopenmp/-fopenacc programs, if the compiler was configured 
> > > > with
> > > > --enable-offload-targets=...  The overhead would be about 340 bytes for
> > > > binaries which doesn't use offloading.  Is this acceptable?  (Jakub?)
> > > 
> > > Can lto-wrapper add them as plugin outputs?  Or does that wreck ordering?
> 
> Currently it's implemented this way, but it will not work after my patch,
> because e.g. offload-without-lto.o and offload-with-lto.o will be linked in
> this order:
> offload-without-lto.o, crtoffloadbegin.o, offload-with-lto.o, crtoffloadend.o
> ^
> (will be not claimed by the plugin)
> 
> But we need this one:
> crtoffloadbegin.o, offload-without-lto.o, offload-with-lto.o, crtoffloadend.o
> 
> > Yeah, if that would work, it would be certainly appreciated, one thing is
> > wasting .text space and relocations in all -fopenmp programs (for -fopenacc
> > programs one kind of assumes there will be some offloading in there),
> > another one some extra constructor/destructor or what that would be even
> > worse.
> 
> They contain only 5 symbols, without constructors/destructors.

This patch adds crtoffload{begin,end}.o to all -fopenmp programs, if they exist.
I couldn't think of a better solution...
Tested using the testcase from the previous mail, e.g.:

$ gcc -DNUM=1 -c -fopenmp test.c -o obj1.o
$ gcc -DNUM=2 -c -fopenmp test.c -o obj2.o
$ gcc -DNUM=3 -c -fopenmp test.c -o obj3.o
$ gcc -DNUM=4 -c -fopenmp test.c -o obj4.o -flto
$ gcc -DNUM=5 -c -fopenmp test.c -o obj5.o
$ gcc -DNUM=6 -c -fopenmp test.c -o obj6.o -flto
$ gcc -DNUM=7 -c -fopenmp test.c -o obj7.o
$ gcc-ar -cvq libtest.a obj3.o obj4.o obj5.o
$ gcc -fopenmp main.c obj1.o obj2.o libtest.a obj6.o obj7.o

And other combinations.


gcc/
PR driver/68463
* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Add
crtoffloadbegin.o for -fopenacc/-fopenmp if it exists.
(GNU_USER_TARGET_ENDFILE_SPEC): Add crtoffloadend.o for
-fopenacc/-fopenmp if it exists.
* lto-wrapper.c (offloadbegin, offloadend): Remove static vars.
(offload_objects_file_name): New static var.
(tool_cleanup): Remove offload_objects_file_name file.
(copy_file): Remove function.
(find_offloadbeginend): Remove function.
(run_gcc): Remove offload_argc and offload_argv.
Get offload_objects_file_name from -foffload-objects=... option.
Read names of object files with offload from this file, pass them to
compile_images_for_offload_targets.  Don't call find_offloadbeginend and
don't pass offloadbegin and offloadend to the linker.  Don't pass
offload non-LTO files to the linker, because now they're not claimed.
lto-plugin/
PR driver/68463
* lto-plugin.c (struct plu

[PATCH][CilkPlus] Fix PR69363

2016-02-17 Thread Ilya Verbin
Hi!

This patch fixes 
Bootstrap and make check passed.  OK for... stage 1?


gcc/c-family/
PR c++/69363
* c-cilkplus.c (c_finish_cilk_clauses): Remove function.
* c-common.h (c_finish_cilk_clauses): Remove declaration.
gcc/c/
PR c++/69363
* c-parser.c (c_parser_cilk_all_clauses): Use c_finish_omp_clauses
instead of c_finish_cilk_clauses.
* c-tree.h (c_finish_omp_clauses): Add new default argument.
* c-typeck.c (c_finish_omp_clauses): Add new argument.  Allow
floating-point variables in the linear clause for Cilk Plus.
gcc/cp/
PR c++/69363
* cp-tree.h (finish_omp_clauses): Add new default argument.
* parser.c (cp_parser_cilk_simd_all_clauses): Use finish_omp_clauses
instead of c_finish_cilk_clauses.
* semantics.c (finish_omp_clauses): Add new argument.  Allow
floating-point variables in the linear clause for Cilk Plus.
gcc/testsuite/
PR c++/69363
* c-c++-common/cilk-plus/PS/clauses3.c: Adjust dg-error string.
* c-c++-common/cilk-plus/PS/clauses4.c: New test.
* c-c++-common/cilk-plus/PS/pr69363.c: New test.


diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c
index 3e7902fd..9f1f364 100644
--- a/gcc/c-family/c-cilkplus.c
+++ b/gcc/c-family/c-cilkplus.c
@@ -41,56 +41,6 @@ c_check_cilk_loop (location_t loc, tree decl)
   return true;
 }
 
-/* Validate and emit code for <#pragma simd> clauses.  */
-
-tree
-c_finish_cilk_clauses (tree clauses)
-{
-  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-{
-  tree prev = clauses;
-
-  /* If a variable appears in a linear clause it cannot appear in
-any other OMP clause.  */
-  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINEAR)
-   for (tree c2 = clauses; c2; c2 = OMP_CLAUSE_CHAIN (c2))
- {
-   if (c == c2)
- continue;
-   enum omp_clause_code code = OMP_CLAUSE_CODE (c2);
-
-   switch (code)
- {
- case OMP_CLAUSE_LINEAR:
- case OMP_CLAUSE_PRIVATE:
- case OMP_CLAUSE_FIRSTPRIVATE:
- case OMP_CLAUSE_LASTPRIVATE:
- case OMP_CLAUSE_REDUCTION:
-   break;
-
- case OMP_CLAUSE_SAFELEN:
-   goto next;
-
- default:
-   gcc_unreachable ();
- }
-
-   if (OMP_CLAUSE_DECL (c) == OMP_CLAUSE_DECL (c2))
- {
-   error_at (OMP_CLAUSE_LOCATION (c2),
- "variable appears in more than one clause");
-   inform (OMP_CLAUSE_LOCATION (c),
-   "other clause defined here");
-   // Remove problematic clauses.
-   OMP_CLAUSE_CHAIN (prev) = OMP_CLAUSE_CHAIN (c2);
- }
- next:
-   prev = c2;
- }
-}
-  return clauses;
-}
-
 /* Calculate number of iterations of CILK_FOR.  */
 
 tree
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index fa3746c..663e457 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1369,7 +1369,6 @@ extern enum stv_conv scalar_to_vector (location_t loc, 
enum tree_code code,
   tree op0, tree op1, bool);
 
 /* In c-cilkplus.c  */
-extern tree c_finish_cilk_clauses (tree);
 extern tree c_validate_cilk_plus_loop (tree *, int *, void *);
 extern bool c_check_cilk_loop (location_t, tree);
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7a27244..4770f45d 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -17427,7 +17427,7 @@ c_parser_cilk_all_clauses (c_parser *parser)
 
  saw_error:
   c_parser_skip_to_pragma_eol (parser);
-  return c_finish_cilk_clauses (clauses);
+  return c_finish_omp_clauses (clauses, false, false, true);
 }
 
 /* This function helps parse the grainsize pragma for a _Cilk_for statement.
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 96ab049..8bfd256 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -661,7 +661,7 @@ extern tree c_begin_omp_task (void);
 extern tree c_finish_omp_task (location_t, tree, tree);
 extern void c_finish_omp_cancel (location_t, tree);
 extern void c_finish_omp_cancellation_point (location_t, tree);
-extern tree c_finish_omp_clauses (tree, bool, bool = false);
+extern tree c_finish_omp_clauses (tree, bool, bool = false, bool = false);
 extern tree c_build_va_arg (location_t, tree, location_t, tree);
 extern tree c_finish_transaction (location_t, tree, int);
 extern bool c_tree_equal (tree, tree);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 1122a88..d91bd72 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12527,7 +12527,8 @@ c_find_omp_placeholder_r (tree *tp, int *, void *data)
Remove any elements from the list that are invalid.  */
 
 tree
-c_finish_omp_clauses (tree clauses, bool is_omp, bool declare_simd)
+c_finish_omp_clauses (tree clauses, bool is_omp,

Re: [PATCH][CilkPlus] Fix PR69363

2016-02-17 Thread Ilya Verbin
On Wed, Feb 17, 2016 at 15:46:00 +0100, Jakub Jelinek wrote:
> On Wed, Feb 17, 2016 at 05:32:58PM +0300, Ilya Verbin wrote:
> > + && !SCALAR_FLOAT_TYPE_P (TREE_TYPE (t))
> > + && TREE_CODE (TREE_TYPE (t)) != POINTER_TYPE)
> > +   {
> > + error_at (OMP_CLAUSE_LOCATION (c),
> > +   "linear clause applied to non-integral, "
> > +   "non-floating, non-pointer variable with type %qT",
> > +   TREE_TYPE (t));
> > + remove = true;
> > + break;
> > +   }
> > +   }
> > + else
> > +   {
> > + if (!INTEGRAL_TYPE_P (TREE_TYPE (t))
> > + && TREE_CODE (TREE_TYPE (t)) != POINTER_TYPE)
> > +   {
> > + error_at (OMP_CLAUSE_LOCATION (c),
> > +   "linear clause applied to non-integral non-pointer "
> 
> This line is too long.  But you could have just done

My editor shows exactly 80 chars.

> > --- a/gcc/cp/semantics.c
> > +++ b/gcc/cp/semantics.c
> 
> > + error ("linear clause applied to non-integral, "
> > +"non-floating, non-pointer variable with %qT type",
> 
> Again too long line, that needs to be wrapped more.

OK, here is 81.

> > +TREE_TYPE (t));
> > + remove = true;
> > + break;
> > +   }
> > +   }
> > + else
> > +   {
> > + if (!INTEGRAL_TYPE_P (type)
> > + && TREE_CODE (type) != POINTER_TYPE)
> > +   {
> > + error ("linear clause applied to non-integral non-pointer 
> > "
> > +"variable with %qT type", TREE_TYPE (t));
> > + remove = true;
> > + break;
> 
> And this can be done like I've hinted above.

OK, here is 81.

  -- Ilya


Re: [PATCH][CilkPlus] Fix PR69363

2016-02-17 Thread Ilya Verbin
On Wed, Feb 17, 2016 at 16:28:34 +0100, Marek Polacek wrote:
> On Wed, Feb 17, 2016 at 04:14:22PM +0100, Jakub Jelinek wrote:
> > On Wed, Feb 17, 2016 at 04:11:44PM +0100, Marek Polacek wrote:
> > > On Wed, Feb 17, 2016 at 06:08:14PM +0300, Ilya Verbin wrote:
> > > > > This line is too long.  But you could have just done
> > > > 
> > > > My editor shows exactly 80 chars.
> > > 
> > > The maximum is 79.
> > 
> > Well, check_GNU_style.sh complains just about one line, and then
> > a prototype.
> > 
> > Lines should not exceed 80 characters.
> > 193:+extern tree finish_omp_clauses  (tree, bool, bool = 
> > false, bool = false);
> > 252:+  error ("linear clause applied to non-integral 
> > non-pointer "
> 
> Maybe it should be fixed with this then.  Because
> <https://www.gnu.org/prep/standards/standards.html#Formatting> says
> "Please keep the length of source lines to 79 characters or less, for maximum
> readability in the widest range of environments."

https://gcc.gnu.org/codingconventions.html#Line says 80.

  -- Ilya


Re: Partial Offloading (was: [hsa merge 07/10] IPA-HSA pass)

2016-02-17 Thread Ilya Verbin
On Thu, Jan 28, 2016 at 12:36:19 +0100, Thomas Schwinge wrote:
> I made an attempt to capture the recent discussion (plus my own
> ideas/understanding) in this new section:
> .  Please
> change/extend, as required.

Thanks for summarizing this.


I'm not very happy how -foffload=disable works in GCC 6, here is a testcase:

int main ()
{
  int x = 10;
  #pragma omp target data map (from: x)
#pragma omp target map (alloc: x)
  x = 20;
  if (x != 10 && x != 20)
__builtin_abort ();
}

On the system with non-shared accelerator it will abort, because "#pragma omp
target data" behaves like offloading is enabled, but "#pragma omp target" runs
on the host.  As the result, at the end of the *target data* region, it tries to
receive x from target and receives 0, or crashes.

We can forbid -foffload=disable option, but I think it's very useful, e.g. for
comparing performance of host vs. accelerator using the same compiler, etc.
Or if the system contains 2 different accelerators, someone might want to
compile only for the first, but libgomp will load 2 plugins, and the program
will crash (instead of doing fallback) if it will try to use the second device.

So, maybe we still need something like this patch?
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01033.html

  -- Ilya


Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper

2016-02-19 Thread Ilya Verbin
On Fri, Feb 19, 2016 at 20:41:58 +0100, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 2 Oct 2014 19:14:57 +0400, Ilya Verbin  wrote:
> > With this patch lto-wrapper performs invocation of mkoffload tool for each
> > offload target.  This tool [...]
> > will compile IR from .gnu.offload_lto_* sections into offload
> > target code and embed the resultant code (offload image) into the new host's
> > object file.
> 
> Consider the following scenario:
> 
> $ cat < CSTS-214-acc.c
> int acc (void)
> {
>   int a;
> 
> #pragma acc parallel num_gangs (1) copyout (a)
>   a = 100;
> 
>   return a;
> }
> $ cat < CSTS-214-test.c
> extern int acc (void);
> 
> int main (void)
> {
>   if (acc () != 100)
> __builtin_abort ();
>   
>   return 0;
> }
> 
> Compile these two files as follows:
> 
> $ [GCC] -fopenacc -c CSTS-214-acc.c
> $ x86_64-linux-gnu-ar -cr CSTS-214-acc.a CSTS-214-acc.o
> $ [GCC] -fopenacc CSTS-214-test.c CSTS-214-acc.a
> 
> The last step will fail -- with incomprehensible diagnostics, ;-) as so
> often when offloading fails...  Here's what's going on: the
> LTO/offloading machinery correctly identifies that it needs to process
> the CSTS-214-acc.c:acc function, present in the CSTS-214-acc.a archive
> file at a certain offset, and it "encodes" that as follows:
> CSTS-214-acc.a@0x9e (see lto-plugin/lto-plugin.c:claim_file_handler, the
> "file->offset != 0" code right at the beginning).  This makes its way
> down through here:
> 
> > --- a/gcc/lto-wrapper.c
> > +++ b/gcc/lto-wrapper.c
> 
> > +/* Copy a file from SRC to DEST.  */
> > +
> > +static void
> > +copy_file (const char *dest, const char *src)
> > +{
> > +  [...]
> > +}
> 
> > @@ -624,6 +852,54 @@ run_gcc (unsigned argc, char *argv[])
> 
> > +  /* If object files contain offload sections, but do not contain LTO 
> > sections,
> > + then there is no need to perform a link-time recompilation, i.e.
> > + lto-wrapper is used only for a compilation of offload images.  */
> > +  if (have_offload && !have_lto)
> > +{
> > +  for (i = 1; i < argc; ++i)
> > +   if ([...])
> > + {
> > +   char *out_file;
> > +   /* Can be ".o" or ".so".  */
> > +   char *ext = strrchr (argv[i], '.');
> > +   if (ext == NULL)
> > + out_file = make_temp_file ("");
> > +   else
> > + out_file = make_temp_file (ext);
> > +   /* The linker will delete the files we give it, so make copies.  */
> > +   copy_file (out_file, argv[i]);
> > +   printf ("%s\n", out_file);
> > + }
> > +[...]
> > +  goto finish;
> > +}
> > +
> >if (lto_mode == LTO_MODE_LTO)
> >  {
> >flto_out = make_temp_file (".lto.o");
> > @@ -850,6 +1126,10 @@ cont:
> >obstack_free (&env_obstack, NULL);
> >  }
> >  
> > + finish:
> > +  if (offloadend)
> > +printf ("%s\n", offloadend);
> > +
> >obstack_free (&argv_obstack, NULL);
> >  }
> 
> When we hit this, for argv "CSTS-214-acc.a@0x9e", the copy_file call will
> fail -- there is no "CSTS-214-acc.a@0x9e" file to copy.  If we strip off
> the "@0x[...]" suffix (but still printf the filename including the
> suffix), then things work.  I copied that bit of code from earlier in
> this function, where the same archive offset handling needs to be done.
> Probably that code should be refactored a bit.
> 
> Also, I wonder if the "ext == NULL" case can really happen, and needs to
> be handled as done in the code cited above, or if that can be simplified?
> (Not yet tested that.)
> 
> Will something like the following be OK to fix this issue, or is that
> something "that should not happen", should be fixed differently?
> 
> --- gcc/lto-wrapper.c
> +++ gcc/lto-wrapper.c
> @@ -1161,15 +1161,31 @@ run_gcc (unsigned argc, char *argv[])
>   && strncmp (argv[i], "-flinker-output=",
>   sizeof ("-flinker-output=") - 1) != 0)
> {
> + char *p;
> + off_t file_offset = 0;
> + long loffset;
> + int consumed;
> + char *filename = argv[i];
> +
> + if ((p = strrchr (argv[i], '@'))
> + && p != argv[i] 
> + && s

Re: [PATCH][RFC][Offloading] Fix PR68463

2016-02-20 Thread Ilya Verbin
On Fri, Feb 19, 2016 at 15:53:08 +0100, Jakub Jelinek wrote:
> On Wed, Feb 10, 2016 at 08:19:34PM +0300, Ilya Verbin wrote:
> > This patch adds crtoffload{begin,end}.o to all -fopenmp programs, if they 
> > exist.
> > I couldn't think of a better solution...
> > Tested using the testcase from the previous mail, e.g.:
> > 
> > $ gcc -DNUM=1 -c -fopenmp test.c -o obj1.o
> > $ gcc -DNUM=2 -c -fopenmp test.c -o obj2.o
> > $ gcc -DNUM=3 -c -fopenmp test.c -o obj3.o
> > $ gcc -DNUM=4 -c -fopenmp test.c -o obj4.o -flto
> > $ gcc -DNUM=5 -c -fopenmp test.c -o obj5.o
> > $ gcc -DNUM=6 -c -fopenmp test.c -o obj6.o -flto
> > $ gcc -DNUM=7 -c -fopenmp test.c -o obj7.o
> > $ gcc-ar -cvq libtest.a obj3.o obj4.o obj5.o
> > $ gcc -fopenmp main.c obj1.o obj2.o libtest.a obj6.o obj7.o
> > 
> > And other combinations.
> 
> Looking at this, I think I have no problem with crtoffloadbegin.o being
> included in all -fopenmp/-fopenacc linked programs/shared libraries,
> that just defines the symbols and nothing else.
> I have no problem with the
> __offload_funcs_end/__offload_vars_end part of crtoffloadend.o being
> included too.
> But, I really don't like __OFFLOAD_TABLE__ being added to all programs, that
> wastes real space in data (rodata or relro?) section, and dynamic
> relocations.
> So, perhaps, can we split offloadstuff.c into 3 objects instead of 2,
> crtoffload{begin,end,table}.o*, where the last one would be what
> defines __OFFLOAD_TABLE__, and add the last one only by the linker
> plugin/lto-wrapper/whatever, if any input objects had any offloading stuff
> in it?

Done.  Bootstrapped and regtested, lto-bootstrap in progress.

Thomas, could you please test it using nvptx, including the testcase with static
libraries?

Could this patch be considered for stage4?  On the one hand, this is not a
regression.  On the other hand, it fixes quite serious issues, and it shouldn't
affect non-offloading configurations.


gcc/
PR driver/68463
* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Add
crtoffloadbegin.o for -fopenacc/-fopenmp if it exists.
(GNU_USER_TARGET_ENDFILE_SPEC): Add crtoffloadend.o for
-fopenacc/-fopenmp if it exists.
* lto-wrapper.c (offloadbegin, offloadend): Remove static vars.
(offload_objects_file_name): New static var.
(tool_cleanup): Remove offload_objects_file_name file.
(find_offloadbeginend): Replace with ...
(find_crtoffloadtable): ... this.
(run_gcc): Remove offload_argc and offload_argv.
Get offload_objects_file_name from -foffload-objects=... option.
Read names of object files with offload from this file, pass them to
compile_images_for_offload_targets.  Don't call find_offloadbeginend and
don't pass offloadbegin and offloadend to the linker.  Don't pass
offload non-LTO files to the linker, because now they're not claimed.
libgcc/
PR driver/68463
* Makefile.in (crtoffloadtable$(objext)): New rule.
* configure.ac (extra_parts): Add crtoffloadtable$(objext) if
enable_offload_targets is not empty.
* configure: Regenerate.
* offloadstuff.c: Move __OFFLOAD_TABLE__ from crtoffloadend to
crtoffloadtable.
lto-plugin/
PR driver/68463
* lto-plugin.c (struct plugin_offload_file): New.
(offload_files): Change type.
(offload_files_last, offload_files_last_obj): New.
(offload_files_last_lto): New.
(free_2): Adjust accordingly.
(all_symbols_read_handler): Don't add offload files to lto_arg_ptr.
Don't call free_1 for offload_files.  Write names of object files with
offloading to the temporary file.  Add new option to lto_arg_ptr.
(claim_file_handler): Don't claim file if it contains offload sections
without LTO sections.  If it contains offload sections, add to the list.


diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index 2f1bbcc..2fdb63c 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -49,14 +49,16 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  %{" NO_PIE_SPEC ":crtbegin.o%s}} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
- fvtable-verify=std:vtv_start.o%s}"
+ fvtable-verify=std:vtv_start.o%s} \
+   %{fopenacc|fopenmp:%:if-exists(crtoffloadbegin%O%s)}"
 #else
 #define GNU_USER_TARGET_STARTFILE_SPEC \
   "%{!shared: %{pg|p|profile:gcrt1.o%s;:crt1.o%s}} \
crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
- fvtable-verify=std:vtv_start.o%s}"
+ fvtable-verify=std:vtv_start.o%s} \
+ 

Re: [PATCH][RFC][Offloading] Fix PR68463

2016-02-22 Thread Ilya Verbin
2016-02-22 18:13 GMT+03:00 Thomas Schwinge :
> On Sat, 20 Feb 2016 13:54:20 +0300, Ilya Verbin  wrote:
>> On Fri, Feb 19, 2016 at 15:53:08 +0100, Jakub Jelinek wrote:
>> > On Wed, Feb 10, 2016 at 08:19:34PM +0300, Ilya Verbin wrote:
>> > > This patch adds crtoffload{begin,end}.o to all -fopenmp programs, if 
>> > > they exist.
>> > > I couldn't think of a better solution...
>> > > Tested using the testcase from the previous mail, e.g.:
>> > >
>> > > $ gcc -DNUM=1 -c -fopenmp test.c -o obj1.o
>> > > $ gcc -DNUM=2 -c -fopenmp test.c -o obj2.o
>> > > $ gcc -DNUM=3 -c -fopenmp test.c -o obj3.o
>> > > $ gcc -DNUM=4 -c -fopenmp test.c -o obj4.o -flto
>> > > $ gcc -DNUM=5 -c -fopenmp test.c -o obj5.o
>> > > $ gcc -DNUM=6 -c -fopenmp test.c -o obj6.o -flto
>> > > $ gcc -DNUM=7 -c -fopenmp test.c -o obj7.o
>> > > $ gcc-ar -cvq libtest.a obj3.o obj4.o obj5.o
>> > > $ gcc -fopenmp main.c obj1.o obj2.o libtest.a obj6.o obj7.o
>> > >
>> > > And other combinations.
>
>> Thomas, could you please test it using nvptx
>
> It mostly ;-) works.  With nvptx offloading enabled (which you don't
> have, do you?), I'm seeing one test case regress:
>
> [-PASS:-]{+FAIL:+} 
> libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  (test for errors, line 9)
> [-PASS:-]{+FAIL:+} 
> libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  (test for errors, line 13)
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> [-PASS:-]{+FAIL:+} 
> libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
>
> (Same for C++.)  That testcase, just recently added by Tom in r233237
> "Handle -fdiagnostics-color in lto", specifies 'dg-additional-options
> "-flto -fno-use-linker-plugin"'.  Is that now an unsupported
> combination/configuration?  (I have not yet looked in detail, but it
> appears as if the offloading compilers are no longer being run for
> -fno-use-linker-plugin.)

Yes, it's really hard to fix the "lto + non-lto objects" issue for
no-use-linker-plugin LTO path. In this patch lto-plugin prepares a
list of objects files with offloading and passes it to lto-wrapper, so
I believe we should consider offloading without lto-plugin as
unsupported. I'll update wiki when the patch will be committed.

>> including the testcase with static
>> libraries?
>
> Works in my manual testing if I work around the following issue:
>
>> --- a/gcc/config/gnu-user.h
>> +++ b/gcc/config/gnu-user.h
>> @@ -49,14 +49,16 @@ see the files COPYING3 and COPYING.RUNTIME respectively. 
>>  If not, see
>> %{" NO_PIE_SPEC ":crtbegin.o%s}} \
>> %{fvtable-verify=none:%s; \
>>   fvtable-verify=preinit:vtv_start_preinit.o%s; \
>> - fvtable-verify=std:vtv_start.o%s}"
>> + fvtable-verify=std:vtv_start.o%s} \
>> +   %{fopenacc|fopenmp:%:if-exists(crtoffloadbegin%O%s)}"
>
> (..., and similar for others.)  The if-exists spec function only works
> for absolute paths (I have not researched, why?), so it won't locate the
> files for relative -Bbuild-gcc/[...] prefixes, and linking will fail:
>
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x0): undefined reference to 
> `__offload_func_table'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x8): undefined reference to 
> `__offload_funcs_end'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x10): undefined reference to 
> `__offload_var_table'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x18): undefined reference to 
> `__offload_vars_end'
>
> If I use the absolute -B$PWD/build-gcc/[...], it works.  (But there is no
> requirement for -B prefixes to be absolute, as far as I know.)  Why not
> make it a hard error, though, if these files are missing?  Can we use
> something like (untested pseudo-patch):
>
> +#ifdef ENABLE_OFFLOADING
> +# define CRTOFFLOADBEGIN "%{fopenacc|fopenmp:%:crtoffloadbegin%O%s}"
> +#else
> +# define CRTOFFLOADBEGIN ""
> +#endif
>
> @@ -49,14 +49,16 @@ see the files COPYING3 and COPYING.RUNTIME 
> respectively.  If not, see
>   %{" NO_PIE_SPEC ":crtbegin.o%s}} \
> %{fvtable-verify=none:%s; \
>   fvtable-verify=preinit:vtv_start_preinit.o%s; \
> - fvta

Re: [PATCH][RFC][Offloading] Fix PR68463

2016-02-24 Thread Ilya Verbin
On Wed, Feb 24, 2016 at 17:13:35 +0100, Thomas Schwinge wrote:
> On Tue, 23 Feb 2016 08:37:07 +0100, Tom de Vries  
> wrote:
> > On 22/02/16 19:07, Ilya Verbin wrote:
> > > 2016-02-22 18:13 GMT+03:00 Thomas Schwinge:
> > >> >On Sat, 20 Feb 2016 13:54:20 +0300, Ilya Verbin  
> > >> >wrote:
> > >>> >>On Fri, Feb 19, 2016 at 15:53:08 +0100, Jakub Jelinek wrote:
> > >>>> >> >On Wed, Feb 10, 2016 at 08:19:34PM +0300, Ilya Verbin wrote:
> > >>>>> >> > >This patch adds crtoffload{begin,end}.o to all -fopenmp 
> > >>>>> >> > >programs, if they exist.
> 
> > >>> >>Thomas, could you please test it using nvptx
> > >> >
> > >> >It mostly;-)  works.  With nvptx offloading enabled (which you don't
> > >> >have, do you?), I'm seeing one test case regress:
> > >> >
> > >> > [-PASS:-]{+FAIL:+} 
> > >> > libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> > >> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  (test for errors, line 
> > >> > 9)
> > >> > [-PASS:-]{+FAIL:+} 
> > >> > libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> > >> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  (test for errors, line 
> > >> > 13)
> > >> > PASS: 
> > >> > libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> > >> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> > >> > [-PASS:-]{+FAIL:+} 
> > >> > libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> > >> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
> > >> >
> > >> >(Same for C++.)  That testcase, just recently added by Tom in r233237
> > >> >"Handle -fdiagnostics-color in lto", specifies 'dg-additional-options
> > >> >"-flto -fno-use-linker-plugin"'.  Is that now an unsupported
> > >> >combination/configuration?  (I have not yet looked in detail, but it
> > >> >appears as if the offloading compilers are no longer being run for
> > >> >-fno-use-linker-plugin.)
> > > Yes, it's really hard to fix the "lto + non-lto objects" issue for
> > > no-use-linker-plugin LTO path. In this patch lto-plugin prepares a
> > > list of objects files with offloading and passes it to lto-wrapper, so
> > > I believe we should consider offloading without lto-plugin as
> > > unsupported. I'll update wiki when the patch will be committed.
> 
> Aha, I see.  I guess there's no point in keeping offloading supported for
> the -fno-lto (default) with -fno-use-linker-plugin configuration?
> 
> Ilya, then please remove
> libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c as part of
> your patch, unless Tom thinks it should be changed to a -flto test, but
> without -fno-use-linker-plugin?

OK.

> > Shouldn't we error (or at least warn) then if we compile a file 
> > containing an offload construct with fopenacc/fopenmp and 
> > -fno-use-linker-plugin?
> 
> Yes, that makes sense to me, too.  (Note that, as I understand it,
> -fno-use-linker-plugin may also be the default for certain GCC
> configurations...)  Aside from spec stuff in gcc/gcc.c relating to
> LINK_PLUGIN_SPEC, I see there's some code in
> gcc/gcc.c:driver::maybe_run_linker evaluating the three possible values
> of HAVE_LTO_PLUGIN, but I have not yet thought about how and where to
> conditionalize the diagnostic if attempting to do offloading in an
> unsupported (-fno-use-linker-plugin) configuration.

To print this error someone has to detect that at least one object contains
offload sections, only linker plugin and lto-wrapper can do it.  But if linker
plugin is absent, the lto-wrapper have to open all objects, scan for all
sections, etc.  Looks like too much overhead for a single diagnostic.

  -- Ilya


Re: [PATCH][RFC][Offloading] Fix PR68463

2016-02-24 Thread Ilya Verbin
On Mon, Feb 22, 2016 at 16:13:07 +0100, Thomas Schwinge wrote:
> (..., and similar for others.)  The if-exists spec function only works
> for absolute paths (I have not researched, why?), so it won't locate the
> files for relative -Bbuild-gcc/[...] prefixes, and linking will fail:
> 
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x0): undefined reference to 
> `__offload_func_table'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x8): undefined reference to 
> `__offload_funcs_end'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x10): undefined reference to 
> `__offload_var_table'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x18): undefined reference to 
> `__offload_vars_end'
> 
> If I use the absolute -B$PWD/build-gcc/[...], it works.  (But there is no
> requirement for -B prefixes to be absolute, as far as I know.)  Why not
> make it a hard error, though, if these files are missing?  Can we use
> something like (untested pseudo-patch):
> 
> +#ifdef ENABLE_OFFLOADING
> +# define CRTOFFLOADBEGIN "%{fopenacc|fopenmp:%:crtoffloadbegin%O%s}"
> +#else
> +# define CRTOFFLOADBEGIN ""
> +#endif
> 
> @@ -49,14 +49,16 @@ see the files COPYING3 and COPYING.RUNTIME 
> respectively.  If not, see
> %{" NO_PIE_SPEC ":crtbegin.o%s}} \
> %{fvtable-verify=none:%s; \
>   fvtable-verify=preinit:vtv_start_preinit.o%s; \
> - fvtable-verify=std:vtv_start.o%s}"
> + fvtable-verify=std:vtv_start.o%s} \
> +   " CRTOFFLOADBEGIN ")}"

Fixed.  Actually ENABLE_OFFLOADING is always defined (to 0 or to 1).

> To the casual reader, skipping the first offload_files looks like a
> off-by-one error, so I suggest you add a comment "Skip the dummy item at
> the start of the list.", or similar.

Done.

> Ilya, then please remove
> libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c as part of
> your patch, unless Tom thinks it should be changed to a -flto test, but
> without -fno-use-linker-plugin?

Done.
Here is a follow up patch.  OK for trunk?  Bootstrapped and regtested.
Unfortunately I'm unable to run bootstrap-lto:
libdecnumber/dpd/decimal32.c:53:0: error: type of ‘decDigitsFromDPD’ does not 
match original declaration [-Werror=lto-type-mismatch]
[...]


diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index 2fdb63c..b0bf40a 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -35,6 +35,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #undef ASM_APP_OFF
 #define ASM_APP_OFF "#NO_APP\n"
 
+#if ENABLE_OFFLOADING == 1
+#define CRTOFFLOADBEGIN "%{fopenacc|fopenmp:crtoffloadbegin%O%s}"
+#define CRTOFFLOADEND "%{fopenacc|fopenmp:crtoffloadend%O%s}"
+#else
+#define CRTOFFLOADBEGIN ""
+#define CRTOFFLOADEND ""
+#endif
+
 /* Provide a STARTFILE_SPEC appropriate for GNU userspace.  Here we add
the GNU userspace magical crtbegin.o file (see crtstuff.c) which
provides part of the support for getting C++ file-scope static
@@ -50,7 +58,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
  fvtable-verify=std:vtv_start.o%s} \
-   %{fopenacc|fopenmp:%:if-exists(crtoffloadbegin%O%s)}"
+   " CRTOFFLOADBEGIN
 #else
 #define GNU_USER_TARGET_STARTFILE_SPEC \
   "%{!shared: %{pg|p|profile:gcrt1.o%s;:crt1.o%s}} \
@@ -58,7 +66,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
  fvtable-verify=std:vtv_start.o%s} \
-   %{fopenacc|fopenmp:%:if-exists(crtoffloadbegin%O%s)}"
+   " CRTOFFLOADBEGIN
 #endif
 #undef  STARTFILE_SPEC
 #define STARTFILE_SPEC GNU_USER_TARGET_STARTFILE_SPEC
@@ -76,14 +84,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  fvtable-verify=std:vtv_end.o%s} \
%{shared:crtendS.o%s;: %{" PIE_SPEC ":crtendS.o%s} \
%{" NO_PIE_SPEC ":crtend.o%s}} crtn.o%s \
-   %{fopenacc|fopenmp:%:if-exists(crtoffloadend%O%s)}"
+   " CRTOFFLOADEND
 #else
 #define GNU_USER_TARGET_ENDFILE_SPEC \
   "%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_end_preinit.o%s; \
  fvtable-verify=std:vtv_end.o%s} \
%{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s \
-   %{fopenacc|fopenmp:%:if-exists(crtoffloadend%O%s)}"
+   " CRTOFFLOADEND
 #endif
 #undef  ENDFILE_SPEC
 #define ENDFILE_SPEC GNU_USER_TARGET_ENDFILE_SPEC
diff --git a/libgcc/offloadstuff.c b/libgcc/offloadstuff.c
index a4ea3ac..4ab6397 100644
--- a/libgcc/offloadstuff.c
+++ b/libgcc/offloadstuff.c
@@ -40,7 +40,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #include "tm.h"
 #include "libgcc_tm.h"
 
-#if defined(HAVE_GAS_HIDDEN) && defined(ENABLE_OFFLOADING)
+#if defined(HAVE_GAS_HIDDEN) && ENABLE_OFFLOADING == 1
 
 #define OFFLOAD_FUNC_TABLE_SECTION_NAME ".gnu.offload_funcs"
 #define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"
diff --git a/lto-plugin/lto-pl

[RFC][gomp4] Offloading: Add device initialization and host->target function mapping

2013-12-20 Thread Ilya Verbin
Hi Jakub,

Could you please take a look at this patch for libgomp?

It adds new function GOMP_register_lib, that should be called from every
exec/lib with target regions (that was done in patch [1]).  This function
maintains the array of pointers to the target shared library descriptors.

Also this patch adds target device initialization into GOMP_target and
GOMP_target_data.  At first, it calls "device_init" function from the plugin.
This function takes array of target-images as input, and returns the array of
target-side addresses.  Currently, it always uses the first target-image from
the descriptor, this should be generalized later.  Then libgomp reads the tables
from host-side exec/libs.  After that, it inserts host->target address mapping
into the splay tree.

[1] http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01486.html

Thanks,
-- Ilya

---
 libgomp/libgomp.map |1 +
 libgomp/target.c|  154 ---
 2 files changed, 146 insertions(+), 9 deletions(-)

diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2b64d05..792047f 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -208,6 +208,7 @@ GOMP_3.0 {
 
 GOMP_4.0 {
   global:
+   GOMP_register_lib;
GOMP_barrier_cancel;
GOMP_cancel;
GOMP_cancellation_point;
diff --git a/libgomp/target.c b/libgomp/target.c
index d84a1fa..a37819a 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -84,6 +84,19 @@ struct splay_tree_key_s {
   bool copy_from;
 };
 
+enum library_descr {
+  DESCR_TABLE_START,
+  DESCR_TABLE_END,
+  DESCR_IMAGE_START,
+  DESCR_IMAGE_END
+};
+
+/* Array of pointers to target shared library descriptors.  */
+static void **libraries;
+
+/* Total number of target shared libraries.  */
+static int num_libraries;
+
 /* Array of descriptors of all available devices.  */
 static struct gomp_device_descr *devices;
 
@@ -117,11 +130,16 @@ struct gomp_device_descr
  TARGET construct.  */
   int id;
 
+  /* Set to true when device is initialized.  */
+  bool is_initialized;
+
   /* Plugin file handler.  */
   void *plugin_handle;
 
   /* Function handlers.  */
-  bool (*device_available_func) (void);
+  bool (*device_available_func) (int);
+  void (*device_init_func) (void **, int *, int, void ***, int *);
+  void (*device_run_func) (void *, uintptr_t);
 
   /* Splay tree containing information about mapped memory regions.  */
   struct splay_tree_s dev_splay_tree;
@@ -466,6 +484,89 @@ gomp_update (struct gomp_device_descr *devicep, size_t 
mapnum,
   gomp_mutex_unlock (&devicep->dev_env_lock);
 }
 
+void
+GOMP_register_lib (const void *openmp_target)
+{
+  libraries = realloc (libraries, (num_libraries + 1) * sizeof (void *));
+
+  if (libraries == NULL)
+return;
+
+  libraries[num_libraries] = (void *) openmp_target;
+
+  num_libraries++;
+}
+
+static void
+gomp_init_device (struct gomp_device_descr *devicep)
+{
+  void **target_images = malloc (num_libraries * sizeof (void *));
+  int *target_img_sizes = malloc (num_libraries * sizeof (int));
+  if (target_images == NULL || target_img_sizes == NULL)
+gomp_fatal ("Can not allocate memory");
+
+  /* Collect target images from the library descriptors and calculate the total
+ size of host address table.  */
+  int i, host_table_size = 0;
+  for (i = 0; i < num_libraries; i++)
+{
+  void **lib = libraries[i];
+  void **host_table_start = lib[DESCR_TABLE_START];
+  void **host_table_end = lib[DESCR_TABLE_END];
+  /* FIXME: Select the proper target image.  */
+  target_images[i] = lib[DESCR_IMAGE_START];
+  target_img_sizes[i] = lib[DESCR_IMAGE_END] - lib[DESCR_IMAGE_START];
+  host_table_size += host_table_end - host_table_start;
+}
+
+  /* Initialize the target device and receive the address table from target.  
*/
+  void **target_table = NULL;
+  int target_table_size = 0;
+  devicep->device_init_func (target_images, target_img_sizes, num_libraries,
+&target_table, &target_table_size);
+  free (target_images);
+  free (target_img_sizes);
+
+  if (host_table_size != target_table_size)
+gomp_fatal ("Can't map target objects");
+
+  /* Initialize the mapping data structure.  */
+  void **target_entry = target_table;
+  for (i = 0; i < num_libraries; i++)
+{
+  void **lib = libraries[i];
+  void **host_table_start = lib[DESCR_TABLE_START];
+  void **host_table_end = lib[DESCR_TABLE_END];
+  void **host_entry;
+  for (host_entry = host_table_start; host_entry < host_table_end;
+  host_entry += 2, target_entry += 2)
+   {
+ struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+ tgt->refcount = 1;
+ tgt->array = gomp_malloc (sizeof (*tgt->array));
+ tgt->tgt_start = (uintptr_t) *target_entry;
+ tgt->tgt_end = tgt->tgt_start + (uint64_t) *(target_entry+1);
+ tgt->to_free = NULL;
+ tgt->list_count = 0;
+ tgt->device_descr = 

Re: [RFC][gomp4] Offloading: Add device initialization and host->target function mapping

2013-12-26 Thread Ilya Verbin
Ping.
(Patch is slightly updated)

On 20 Dec 21:18, Ilya Verbin wrote:
> Hi Jakub,
> 
> Could you please take a look at this patch for libgomp?
> 
> It adds new function GOMP_register_lib, that should be called from every
> exec/lib with target regions (that was done in patch [1]).  This function
> maintains the array of pointers to the target shared library descriptors.
> 
> Also this patch adds target device initialization into GOMP_target and
> GOMP_target_data.  At first, it calls "device_init" function from the plugin.
> This function takes array of target-images as input, and returns the array of
> target-side addresses.  Currently, it always uses the first target-image from
> the descriptor, this should be generalized later.  Then libgomp reads the 
> tables
> from host-side exec/libs.  After that, it inserts host->target address mapping
> into the splay tree.
> 
> [1] http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01486.html
> 
> Thanks,
> -- Ilya

-- Ilya

---
 libgomp/libgomp.map |1 +
 libgomp/target.c|  154 ---
 2 files changed, 146 insertions(+), 9 deletions(-)

diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2b64d05..792047f 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -208,6 +208,7 @@ GOMP_3.0 {
 
 GOMP_4.0 {
   global:
+   GOMP_register_lib;
GOMP_barrier_cancel;
GOMP_cancel;
GOMP_cancellation_point;
diff --git a/libgomp/target.c b/libgomp/target.c
index d84a1fa..7677c28 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -84,6 +84,19 @@ struct splay_tree_key_s {
   bool copy_from;
 };
 
+enum library_descr {
+  DESCR_TABLE_START,
+  DESCR_TABLE_END,
+  DESCR_IMAGE_START,
+  DESCR_IMAGE_END
+};
+
+/* Array of pointers to target shared library descriptors.  */
+static void **libraries;
+
+/* Total number of target shared libraries.  */
+static int num_libraries;
+
 /* Array of descriptors of all available devices.  */
 static struct gomp_device_descr *devices;
 
@@ -117,11 +130,16 @@ struct gomp_device_descr
  TARGET construct.  */
   int id;
 
+  /* Set to true when device is initialized.  */
+  bool is_initialized;
+
   /* Plugin file handler.  */
   void *plugin_handle;
 
   /* Function handlers.  */
-  bool (*device_available_func) (void);
+  bool (*device_available_func) (int);
+  void (*device_init_func) (void **, int *, int, void ***, int *);
+  void (*device_run_func) (void *, uintptr_t);
 
   /* Splay tree containing information about mapped memory regions.  */
   struct splay_tree_s dev_splay_tree;
@@ -466,6 +484,89 @@ gomp_update (struct gomp_device_descr *devicep, size_t 
mapnum,
   gomp_mutex_unlock (&devicep->dev_env_lock);
 }
 
+void
+GOMP_register_lib (const void *openmp_target)
+{
+  libraries = realloc (libraries, (num_libraries + 1) * sizeof (void *));
+
+  if (libraries == NULL)
+return;
+
+  libraries[num_libraries] = (void *) openmp_target;
+
+  num_libraries++;
+}
+
+static void
+gomp_init_device (struct gomp_device_descr *devicep)
+{
+  void **target_images = malloc (num_libraries * sizeof (void *));
+  int *target_img_sizes = malloc (num_libraries * sizeof (int));
+  if (target_images == NULL || target_img_sizes == NULL)
+gomp_fatal ("Can not allocate memory");
+
+  /* Collect target images from the library descriptors and calculate the total
+ size of host address table.  */
+  int i, host_table_size = 0;
+  for (i = 0; i < num_libraries; i++)
+{
+  void **lib = libraries[i];
+  void **host_table_start = lib[DESCR_TABLE_START];
+  void **host_table_end = lib[DESCR_TABLE_END];
+  /* FIXME: Select the proper target image.  */
+  target_images[i] = lib[DESCR_IMAGE_START];
+  target_img_sizes[i] = lib[DESCR_IMAGE_END] - lib[DESCR_IMAGE_START];
+  host_table_size += host_table_end - host_table_start;
+}
+
+  /* Initialize the target device and receive the address table from target.  
*/
+  void **target_table = NULL;
+  int target_table_size = 0;
+  devicep->device_init_func (target_images, target_img_sizes, num_libraries,
+&target_table, &target_table_size);
+  free (target_images);
+  free (target_img_sizes);
+
+  if (host_table_size != target_table_size)
+gomp_fatal ("Can't map target objects");
+
+  /* Initialize the mapping data structure.  */
+  void **target_entry = target_table;
+  for (i = 0; i < num_libraries; i++)
+{
+  void **lib = libraries[i];
+  void **host_table_start = lib[DESCR_TABLE_START];
+  void **host_table_end = lib[DESCR_TABLE_END];
+  void **host_entry;
+  for (host_entry = host_table_start; host_entry < host_table_end;
+  host_entry += 2, target_entry += 2)
+   {
+ struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+ tgt->refcount = 1;
+ 

Re: [gomp4.1 WIP] omp_target_* libgomp APIs

2015-07-13 Thread Ilya Verbin
On Mon, Jul 13, 2015 at 15:17:29 +0200, Jakub Jelinek wrote:
> Here is a new version that I've committed.  I've finished up
> associate/disassociate, wrote a test and tested also with intelmicemul
> offloading.

Great!

> +  k->refcount = INT_MAX;

Shouldn't it be UINTPTR_MAX?

> +  /* FIXME: Support device-to-device somehow?  */

Should libgomp copy data device-host-device if device-device is not supported by
target?  Current liboffloadmic doesn't support this.  I'll find out if there are
any plans.

  -- Ilya


Re: GOMP_offload_register

2015-07-13 Thread Ilya Verbin
On Mon, Jul 13, 2015 at 09:42:50 -0400, Nathan Sidwell wrote:
> GOMP_offload_register's target data argument is 'void *'.  Is there
> any reason it shouldn't be 'const void *'?  It would seem to me that
> that would be better?
> 
> (a cursory look at i386/intelmic-mkoffload.c suggests a lack of
> consts in the variable decls there.  ptx suffers the same problem)

I can't remember any reason, so I agree that const is better (if this works :)

  -- Ilya


Re: [PATCH 2/4] Add liboffloadmic

2015-07-13 Thread Ilya Verbin
On Thu, Jul 09, 2015 at 12:00:29 +0200, Thomas Schwinge wrote:
> I noticed that -- at least with current versions of GCC -- there are
> several compiler diagnostics displayed during the build.  It would be
> nice to get these addressed -- as applicable, presumably in the Intel
> upstream version, and then a new import be done into GCC?  For example, I
> noticed the following changes in my build logs (not a complete list):
> 
> {+[...]/source-gcc/liboffloadmic/runtime/emulator/coi_device.cpp:112:28: 
> warning: invalid suffix on literal; C++11 requires a space between literal 
> and string macro [-Wliteral-suffix]+}
> {+   sprintf (pipe_host_path, "%s"PIPE_HOST_PATH, mic_dir);+}
> {+^+}
> {+[...]/source-gcc/liboffloadmic/runtime/emulator/coi_device.cpp:113:30: 
> warning: invalid suffix on literal; C++11 requires a space between literal 
> and string macro [-Wliteral-suffix]+}
> {+   sprintf (pipe_target_path, "%s"PIPE_TARGET_PATH, mic_dir);+}
> {+  ^+}
> 
> {+[...]/source-gcc/liboffloadmic/runtime/emulator/coi_host.cpp:892:24: 
> warning: invalid suffix on literal; C++11 requires a space between literal 
> and string macro [-Wliteral-suffix]+}
> {+   sprintf (pipes_path, "%s"PIPES_PATH, eng->dir);+}
> {+^+}
> {+[...]/source-gcc/liboffloadmic/runtime/emulator/coi_host.cpp:903:28: 
> warning: invalid suffix on literal; C++11 requires a space between literal 
> and string macro [-Wliteral-suffix]+}
> {+   sprintf (pipe_host_path, "%s"PIPE_HOST_PATH, eng->dir);+}
> {+^+}
> {+[...]/source-gcc/liboffloadmic/runtime/emulator/coi_host.cpp:904:30: 
> warning: invalid suffix on literal; C++11 requires a space between literal 
> and string macro [-Wliteral-suffix]+}
> {+   sprintf (pipe_target_path, "%s"PIPE_TARGET_PATH, eng->dir);+}
> {+  ^+}
> 
> [...]/source-gcc/liboffloadmic/runtime/offload_host.cpp:107:30: warning: 
> [-deprecated conversion from-]{+ISO C++ forbids converting a+} string 
> constant to 'char*' [-Wwrite-strings]
>  static char *timer_envname = "H_TIME";
>   ^
> 
> [...]/source-gcc/liboffloadmic/runtime/offload_myo_host.cpp: In function 
> 'void __intel_cilk_for_32_offload(int, void (*)(void*, void*), int, void*, 
> void*, unsigned int, unsigned int)':
> [...]/source-gcc/liboffloadmic/runtime/offload_myo_host.cpp:762:55: 
> warning: [-deprecated conversion from-]{+ISO C++ forbids converting a+} 
> string constant to 'char*' [-Wwrite-strings]
> args, target_number)
>^
> [...]/source-gcc/liboffloadmic/runtime/offload_myo_host.cpp: In function 
> 'void __intel_cilk_for_64_offload(int, void (*)(void*, void*), int, void*, 
> void*, uint64_t, uint64_t)':
> [...]/source-gcc/liboffloadmic/runtime/offload_myo_host.cpp:815:49: 
> warning: [-deprecated conversion from-]{+ISO C++ forbids converting a+} 
> string constant to 'char*' [-Wwrite-strings]
> target_number)
>  ^
> 
> [...]/source-gcc/liboffloadmic/runtime/offload_orsl.cpp:39:33: warning: 
> [-deprecated conversion from-]{+ISO C++ forbids converting a+} string 
> constant to 'ORSLTag {aka char*}' [-Wwrite-strings]
>  static const ORSLTag   my_tag = "Offload";

Yeah, they are already fixed in the upstream version.  I'll prepare an update
for GCC soon.

  -- Ilya


Re: [gomp4.1 WIP] omp_target_* libgomp APIs

2015-07-13 Thread Ilya Verbin
On Mon, Jul 13, 2015 at 16:03:06 +0200, Jakub Jelinek wrote:
> On Mon, Jul 13, 2015 at 04:38:33PM +0300, Ilya Verbin wrote:
> > On Mon, Jul 13, 2015 at 15:17:29 +0200, Jakub Jelinek wrote:
> > > +  k->refcount = INT_MAX;
> > 
> > Shouldn't it be UINTPTR_MAX?
> 
> Dunno if we can count on it being in stdint.h on all targets.
> Perhaps
> #define REFCOUNT_INFINITY (~(uintptr_t) 0)
> ?

Probably, I don't know.

> > > +  /* FIXME: Support device-to-device somehow?  */
> > 
> > Should libgomp copy data device-host-device if device-device is not 
> > supported by
> > target?  Current liboffloadmic doesn't support this.  I'll find out if 
> > there are
> > any plans.
> 
> There is also the option to spawn an offloaded function that will just call
> memcpy, or have such a function next to the main () of the program that we 
> link
> in.

Do you mean the case when src_devicep == dst_devicep ?  It's easy to support
this by adding new func into plugin, whithout any changes in liboffloadmic.
I thought about memcpy between different devices...

> Also, could you see if the 2 and 3 dimension memcpy_rect couldn't be handled
> more efficiently by liboffloadmic too?
> From what I can see, on the cuda side there is some cudaMemcpy2D and
> cudaMemcpy3D, though I admit I haven't studied in detail what exactly they
> do.

I'll try to find out.

  -- Ilya


Re: [gomp4.1 WIP] omp_target_* libgomp APIs

2015-07-13 Thread Ilya Verbin
On Mon, Jul 13, 2015 at 17:26:43 +0200, Jakub Jelinek wrote:
> > > > > +  /* FIXME: Support device-to-device somehow?  */
> > > > 
> > > > Should libgomp copy data device-host-device if device-device is not 
> > > > supported by
> > > > target?  Current liboffloadmic doesn't support this.  I'll find out if 
> > > > there are
> > > > any plans.
> > > 
> > > There is also the option to spawn an offloaded function that will just 
> > > call
> > > memcpy, or have such a function next to the main () of the program that 
> > > we link
> > > in.
> > 
> > Do you mean the case when src_devicep == dst_devicep ?  It's easy to support
> > this by adding new func into plugin, whithout any changes in liboffloadmic.
> > I thought about memcpy between different devices...
> 
> Well, even src_devicep == dst_devicep does not guarantee it is the same
> device, that is the case only if also src_devicep->target_id ==
> dst_devicep->target_id, right?

Why?  Devices of one type with different target_id's have different entries in
devices[].

> I wouldn't worry about that and just return EINVAL when copying in between
> different devices.

I'll prepare a patch, which will add an interface for copying within one device,
covered by GOMP_OFFLOAD_CAP_OPENMP_400.

  -- Ilya


Re: [gomp4.1 WIP] omp_target_* libgomp APIs

2015-07-13 Thread Ilya Verbin
On Mon, Jul 13, 2015 at 18:50:29 +0300, Ilya Verbin wrote:
> On Mon, Jul 13, 2015 at 17:26:43 +0200, Jakub Jelinek wrote:
> > > > > > +  /* FIXME: Support device-to-device somehow?  */
> > > > > 
> > > > > Should libgomp copy data device-host-device if device-device is not 
> > > > > supported by
> > > > > target?  Current liboffloadmic doesn't support this.  I'll find out 
> > > > > if there are
> > > > > any plans.
> > > > 
> > > > There is also the option to spawn an offloaded function that will just 
> > > > call
> > > > memcpy, or have such a function next to the main () of the program that 
> > > > we link
> > > > in.
> > > 
> > > Do you mean the case when src_devicep == dst_devicep ?  It's easy to 
> > > support
> > > this by adding new func into plugin, whithout any changes in 
> > > liboffloadmic.
> > > I thought about memcpy between different devices...
> > 
> > Well, even src_devicep == dst_devicep does not guarantee it is the same
> > device, that is the case only if also src_devicep->target_id ==
> > dst_devicep->target_id, right?
> 
> Why?  Devices of one type with different target_id's have different entries in
> devices[].
> 
> > I wouldn't worry about that and just return EINVAL when copying in between
> > different devices.
> 
> I'll prepare a patch, which will add an interface for copying within one 
> device,
> covered by GOMP_OFFLOAD_CAP_OPENMP_400.

Here it is.  make check-target-libgomp passed.


libgomp/
* libgomp.h (struct gomp_device_descr): Add dev2dev_func.
* target.c (omp_target_memcpy): Support device-to-device.
(omp_target_memcpy_rect_worker): Likewise.
(omp_target_memcpy_rect): Likewise.
(gomp_load_plugin_for_device): Check for GOMP_OFFLOAD_dev2dev.
* testsuite/libgomp.c/target-12.c (main): Extend for testing
device-to-device memcpy.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_dev2dev): New
function.
* plugin/offload_target_main.cpp (__offload_target_tgt2tgt): New static
function, register it in liboffloadmic.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 8ed1abd..a64b98c 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -768,6 +768,7 @@ struct gomp_device_descr
   void (*free_func) (int, void *);
   void *(*dev2host_func) (int, void *, const void *, size_t);
   void *(*host2dev_func) (int, void *, const void *, size_t);
+  void *(*dev2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
 
   /* Splay tree containing information about mapped memory regions.  */
diff --git a/libgomp/target.c b/libgomp/target.c
index 024a9c8..2bfc019 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1329,7 +1329,15 @@ omp_target_memcpy (void *dst, void *src, size_t length, 
size_t dst_offset,
   gomp_mutex_unlock (&src_devicep->lock);
   return 0;
 }
-  /* FIXME: Support device-to-device somehow?  */
+  if (src_devicep == dst_devicep)
+{
+  gomp_mutex_lock (&src_devicep->lock);
+  src_devicep->dev2dev_func (src_devicep->target_id,
+(char *) dst + dst_offset,
+(char *) src + src_offset, length);
+  gomp_mutex_unlock (&src_devicep->lock);
+  return 0;
+}
   return EINVAL;
 }
 
@@ -1364,6 +1372,10 @@ omp_target_memcpy_rect_worker (void *dst, void *src, 
size_t element_size,
src_devicep->dev2host_func (src_devicep->target_id,
(char *) dst + dst_off,
(char *) src + src_off, length);
+  else if (src_devicep == dst_devicep)
+   src_devicep->dev2dev_func (src_devicep->target_id,
+  (char *) dst + dst_off,
+  (char *) src + src_off, length);
   else
return EINVAL;
   return 0;
@@ -1437,10 +1449,6 @@ omp_target_memcpy_rect (void *dst, void *src, size_t 
element_size,
src_devicep = NULL;
 }
 
-  /* FIXME: Support device-to-device somehow?  */
-  if (src_devicep != NULL && dst_devicep != NULL)
-return EINVAL;
-
   if (src_devicep)
 gomp_mutex_lock (&src_devicep->lock);
   else if (dst_devicep)
@@ -1601,10 +1609,10 @@ gomp_load_plugin_for_device (struct gomp_device_descr 
*device,
 }  \
   while (0)
   /* Similar, but missing functions are not an error.  */
-#define DLSYM_OPT(f, n)\
+#define DLSYM_OPT(f, n)   

Re: [gomp4.1] Handle linear clause modifiers in declare simd

2015-07-14 Thread Ilya Verbin
On Wed, Jul 01, 2015 at 12:55:38 +0200, Jakub Jelinek wrote:
>   * cgraph.h (enum cgraph_simd_clone_arg_type): Add
>   SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP,
>   SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_CONSTANT_STEP,
>   and SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP.
>   (struct cgraph_simd_clone_arg): Adjust comment.
>   * omp-low.c (simd_clone_clauses_extract): Honor
>   OMP_CLAUSE_LINEAR_KIND.
>   (simd_clone_mangle): Mangle the various linear kinds
>   per the new ABI.
>   (simd_clone_adjust_argument_types): Handle
>   SIMD_CLONE_ARG_TYPE_LINEAR_*_CONSTANT_STEP.
>   (simd_clone_init_simd_arrays): Don't do anything
>   for uval.
>   (simd_clone_adjust): Handle
>   SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP like
>   SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP.
>   Handle SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_CONSTANT_STEP.
> c/
>   * c-tree.h (c_finish_omp_clauses): Add declare_simd argument.
>   * c-parser.c (c_parser_omp_clause_linear): Don't handle uval
>   modifier in C.
>   (c_parser_omp_all_clauses): If mask includes uniform clause,
>   pass true to c_finish_omp_clauses' declare_simd.
>   * c-typeck.c (c_finish_omp_clauses): Add declare_simd argument,
>   don't set need_implicitly_determined if it is true.
> cp/
>   * cp-tree.h (finish_omp_clauses): Add declare_simd argument.
>   * parser.c (cp_parser_omp_all_clauses): If mask includes uniform
>   clause, pass true to finish_omp_clauses' declare_simd.
>   * pt.c (apply_late_template_attributes): Pass true to
>   finish_omp_clauses' declare_simd.
>   * semantics.c (finish_omp_clauses): Add declare_simd argument,
>   don't set need_implicitly_determined if it is true.
> testsuite/
>   * gcc.dg/gomp/clause-1.c (foo): Add some linear clause tests.
>   * g++.dg/gomp/clause-3.C (foo): Likewise.
>   * g++.dg/gomp/declare-simd-3.C: New test.

This caused:

gcc/tree-vect-stmts.c: In function ‘bool vectorizable_simd_clone_call(gimple, 
gimple_stmt_iterator*, gimple_statement_base**, slp_tree)’:
gcc/tree-vect-stmts.c:2810:13: error: enumeration value 
‘SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP’ not handled in switch 
[-Werror=switch]
  switch (n->simdclone->args[i].arg_type)
 ^
gcc/tree-vect-stmts.c:2810:13: error: enumeration value 
‘SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_CONSTANT_STEP’ not handled in switch 
[-Werror=switch]
gcc/tree-vect-stmts.c:2810:13: error: enumeration value 
‘SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP’ not handled in switch 
[-Werror=switch]
cc1plus: all warnings being treated as errors
make[4]: *** [tree-vect-stmts.o] Error 1

  -- Ilya


Re: [gomp] constify device data & fix cleanup

2015-07-15 Thread Ilya Verbin
2015-07-15 2:59 GMT+03:00 Nathan Sidwell :
> The other thing this does is change the interface between libgommp and the 
> plugin's load_image and unload_image routines.  I've added the ability to 
> return a pointer to target-specific connection data, and have it provided to 
> the unload function.  The ptx routines allocate some storage during loading, 
> but had no way to free it on onloading. (Actually, the unloading was rather 
> broken, attempting to free the wrong thing.)  this data is stashed in the map 
> created for host->target fns & vars.

Why do you need to return dev_data to libgomp?  Is it possible to save
it in plugin, e.g. in some global set with target_data as a key?  I've
implemented unloading this way in plugin-intelmic.

@@ -350,11 +350,11 @@ generate_host_descr_file (const char *ho
"#ifdef __cplusplus\n"
"extern \"C\"\n"
"#endif\n"
-   "void GOMP_offload_register (void *, int, void *);\n"
+   "void GOMP_offload_register (void *, int, const void *);\n"
+   "void GOMP_offload_unregister (void *, int, void const *);\n"
"#ifdef __cplusplus\n"
"extern \"C\"\n"
"#endif\n"
-   "void GOMP_offload_unregister (void *, int, void *);\n\n"

I haven't tried to build intelmic-mkoffload, but looks like here is
something wrong with extern "C".

  -- Ilya


Re: [gomp4.1] Support C++ "this" in OpenMP directives

2015-07-15 Thread Ilya Verbin
On Thu, Jul 09, 2015 at 10:50:14 +0200, Jakub Jelinek wrote:
>   * parser.c (cp_parser_omp_var_list_no_open): Parse this.
>   * cp-tree.h (finish_omp_declare_simd_methods): New prototype.
>   * semantics.c (handle_omp_array_sections_1): Disallow this based
>   array sections for OpenMP.
>   (finish_omp_declare_simd_methods): New function.
>   (finish_omp_clauses): Don't attempt to adjust linear step of
>   this if it points to TYPE_BEING_DEFINED.  Disallow this in
>   all clauses expecting variable lists, except for declare simd
>   linear/uniform/aligned clauses.
>   (finish_struct_1): Call finish_omp_declare_simd_methods.
> 
>   * g++.dg/vect/simd-clone-2.cc: New test.
>   * g++.dg/vect/simd-clone-2.h: New file.
>   * g++.dg/vect/simd-clone-3.cc: New test.
>   * g++.dg/vect/simd-clone-4.cc: New test.
>   * g++.dg/vect/simd-clone-4.h: New file.
>   * g++.dg/vect/simd-clone-5.cc: New test.
>   * g++.dg/gomp/this-1.C: New test.
>   * g++.dg/gomp/this-2.C: New test.

One more warning:

gcc/cp/parser.c: In function ‘tree_node* 
cp_parser_omp_var_list_no_open(cp_parser*, omp_clause_code, tree, bool*)’:
gcc/cp/parser.c:27931:26: error: ‘name’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
  token->location);
  ^
cc1plus: all warnings being treated as errors
make[4]: *** [cp/parser.o] Error 1

  -- Ilya


Re: Tests for libgomp based on OpenMP Examples 4.0.2

2015-07-15 Thread Ilya Verbin
On Wed, Jul 15, 2015 at 12:29:53 +0200, Dominique d'Humières wrote:
> > The patch replaces all FP comparisons with inequalities and epsilons
> > in those tests for libgomp.
> In libgomp/testsuite/libgomp.fortran/examples-4/simd-8.f90
> 
> integer, parameter :: EPS = 0.005
> 
> should be
> 
> real, parameter :: EPS = 0.005

Committed as obvious.


2015-07-15  Maxim Blumenthal  

* testsuite/libgomp.fortran/examples-4/simd-8.f90: (main): Change type
of EPS parameter from integer to real.
* testsuite/libgomp.fortran/examples-4/task_dep-5.f90: (check): Change
type of EPS parameter from integer to real.


diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/simd-8.f90 
b/libgomp/testsuite/libgomp.fortran/examples-4/simd-8.f90
index ba7b0f9..3c7869d 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/simd-8.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/simd-8.f90
@@ -36,7 +36,7 @@ program simd_8f
   implicit none
   real :: pri, arr(1000), diff
   integer :: i
-  integer, parameter :: EPS = 0.005
+  real, parameter :: EPS = 0.005
 
   do i = 1, 1000
  P(i)   = i
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/task_dep-5.f90 
b/libgomp/testsuite/libgomp.fortran/examples-4/task_dep-5.f90
index f12b42c..0746531 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/task_dep-5.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/task_dep-5.f90
@@ -44,7 +44,7 @@ contains
   subroutine check (N, A, B)
 integer :: N
 integer :: i, j
-integer, parameter :: EPS = 0.01
+real, parameter :: EPS = 0.01
 real, dimension(N,N) :: A, B
 real :: diff
 do i = 1, N


  -- Ilya


Re: [gomp] Fix PTX unloading

2015-07-15 Thread Ilya Verbin
On Wed, Jul 15, 2015 at 14:36:45 -0400, Nathan Sidwell wrote:
> -= devicep->load_image_func (devicep->target_id, target_data, 
> &target_table);
> += devicep->load_image_func (devicep->target_id, target_data,
> + &target_table);

It was exactly 80 chars long :)

  -- Ilya


Re: Constify host-side offload data`

2015-07-16 Thread Ilya Verbin
On Wed, Jul 15, 2015 at 20:56:50 -0400, Nathan Sidwell wrote:
> Index: gcc/config/nvptx/mkoffload.c
> ===
> -  fprintf (out, "extern void *__OFFLOAD_TABLE__[];\n\n");
> +  fprintf (out, "extern const void *conat __OFFLOAD_TABLE__[];\n\n");

Here is a typo.

  -- Ilya


Re: constify target offload data

2015-07-17 Thread Ilya Verbin
On Thu, Jul 16, 2015 at 16:08:47 -0400, Nathan Sidwell wrote:
> Jakub, Ilya,
> this patch against trunk constifies the offload target data.  I'm
> having difficulty building an intelmic toolchain, so the changes
> there aren't tested. Ilya, if you could check them, that'd be great.

Works fine with one change:


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 136fb99..baa4945 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -61,7 +61,7 @@ typedef std::vector AddrVect;
 typedef std::vector DevAddrVect;
 
 /* Addresses for all images and all devices.  */
-typedef std::map ImgDevAddrMap;
+typedef std::map ImgDevAddrMap;
 
 /* Image descriptor needed by __offload_[un]register_image.  */
 struct TargetImageDesc {


  -- Ilya


Re: [gomp4.1] Fix linear-2.{c,C} testcases

2015-07-17 Thread Ilya Verbin
On Fri, Jul 17, 2015 at 15:54:13 +0200, Jakub Jelinek wrote:
> These tests had a thinko, computation performed on the offloaded copy of the
> a variable, but then tested on the host side, without #pragma omp target
> update or similar.
> Fixed thusly.

In my testing linear-2.C still causes SIGSEGV on target in f1:

   0x76fc3872 <_Z2f1IiEvRT_._omp_fn.29(void)>:  push   %rbp
   0x76fc3873 <_Z2f1IiEvRT_._omp_fn.29(void)+1>:mov%rsp,%rbp
   0x76fc3876 <_Z2f1IiEvRT_._omp_fn.29(void)+4>:push   %rbx
   0x76fc3877 <_Z2f1IiEvRT_._omp_fn.29(void)+5>:sub$0x48,%rsp
   0x76fc387b <_Z2f1IiEvRT_._omp_fn.29(void)+9>:mov%rdi,-0x48(%rbp)
   0x76fc387f <_Z2f1IiEvRT_._omp_fn.29(void)+13>:   lea-0x34(%rbp),%rax
   0x76fc3883 <_Z2f1IiEvRT_._omp_fn.29(void)+17>:   mov%rax,-0x18(%rbp)
   0x76fc3887 <_Z2f1IiEvRT_._omp_fn.29(void)+21>:   mov-0x48(%rbp),%rax
   0x76fc388b <_Z2f1IiEvRT_._omp_fn.29(void)+25>:   mov(%rax),%rax
   0x76fc388e <_Z2f1IiEvRT_._omp_fn.29(void)+28>:   mov(%rax),%rax
=> 0x76fc3891 <_Z2f1IiEvRT_._omp_fn.29(void)+31>:   mov(%rax),%edx

(gdb) x $rax
0x7fff537fc1ec: Cannot access memory at address 0x7fff537fc1ec

Probably something wasn't mapped.


> I'm still seeing
> FAIL: libgomp.c/for-5.c (internal compiler error)
> FAIL: libgomp.c/for-5.c (test for excess errors)
> FAIL: libgomp.c++/for-13.C (internal compiler error)
> FAIL: libgomp.c++/for-13.C (test for excess errors)
> which is some LTO ICE.

I've never seen such ICEs before...

  -- Ilya


Re: [gomp4.1] Initial support for some OpenMP 4.1 construct parsing

2015-07-17 Thread Ilya Verbin
On Thu, Jun 25, 2015 at 22:10:58 +0200, Jakub Jelinek wrote:
> On Thu, Jun 25, 2015 at 10:45:29PM +0300, Ilya Verbin wrote:
> > So, as I understood, three tasks will be generated almost simultaneously in
> > foo1: one on host and two on target.
> > Target task 1 will be executed immediately.
> > Host task will wait for task 1 to be completed on target.
> > (Or it is not possible to mix "omp target" and "omp task" dependencies?)
> > And task 2 will wait on target for task 1.
> 
> My understanding is that you don't create any extra tasks,
> but rather you pointer translate the host address from the start of the
> variable (or array section; thus the depend clause argument) into
> target address, and check if it can be offloaded right away (no need
> to wait for dependencies).  If yes, you just offload it, with nowait
> without waiting in the caller till it finishes.  If not, you arrange
> that when some other offloaded job finishes that provides the dependency,
> your scheduled job is executed.
> So, the task on the target is the implicit one, what executes the
> body of the target region.
> In tasking (task.c) dependencies are only honored for sibling tasks,
> whether the different target implicit tasks are sibling is questionable and
> supposedly should be clarified, but I can't imagine they aren't meant to.
> So, you don't really need to care about the task.c dependencies, target.c
> could have its own ones if it is easier to write it that way.
> Supposedly for nowait you want to spawn or queue the job and return right
> away, and for queued job stick it into some data structure (supposedly
> inside of libgomp on the host) that when the library is (asynchronously)
> notified that some offloaded job finished you check the data structures
> and spawn something different.  Or have the data structures on the offloaded
> device instead?
> 
> In any case, I'd look what the Mentor folks are doing for OpenACC async
> offloading, what libmicoffload allows you to do and figure out something
> from that.

One big question is who will maintain the list of scheduled job, its
dependencies, etc. - libgomp or each target plugin?


OpenACC has async queues:
#pragma acc parallel async(2) wait(1)

But it's not possible to have 2 waits like:
#pragma acc parallel async(3) wait(1) wait(2)

(GOMP_OFFLOAD_openacc_async_wait_async has only one argument with the number of
queue to wait)

Thomas, please correct me if I'm wrong.

In this regard, OpenMP is more complicated, since it allows e.g.:
#pragma omp target nowait depend(in: a, b) depend(out: c, d)

Currently I'm trying to figure out what liboffloadmic can do.


BTW, do you plan to remove GOMP_MAP_POINTER mappings from array sections?
The enter/exit patch for libgomp depends on this change.

  -- Ilya


Re: [gomp4.1] Initial support for some OpenMP 4.1 construct parsing

2015-07-20 Thread Ilya Verbin
On Fri, Jul 17, 2015 at 18:43:06 +0200, Jakub Jelinek wrote:
> On Fri, Jul 17, 2015 at 07:31:36PM +0300, Ilya Verbin wrote:
> > One big question is who will maintain the list of scheduled job, its
> > dependencies, etc. - libgomp or each target plugin?
> > 
> > 
> > OpenACC has async queues:
> > #pragma acc parallel async(2) wait(1)
> > 
> > But it's not possible to have 2 waits like:
> > #pragma acc parallel async(3) wait(1) wait(2)
> > 
> > (GOMP_OFFLOAD_openacc_async_wait_async has only one argument with the 
> > number of
> > queue to wait)
> > 
> > Thomas, please correct me if I'm wrong.
> > 
> > In this regard, OpenMP is more complicated, since it allows e.g.:
> > #pragma omp target nowait depend(in: a, b) depend(out: c, d)
> 
> If it is each plugin, then supposedly it should use (if possible) some
> common libgomp routine to maintain the queues, duplicating the dependency
> graph handling code in each plugins might be too ugly.
> 
> > Currently I'm trying to figure out what liboffloadmic can do.

Latest liboffloadmic (I'm preparing an update for trunk) can take some pointer
*ptr* as argument of __offload_offload, which is used for execution and data
transfer.  When given job is finished, it will call some callback in libgomp on
host, passing *ptr* back to it, thus libgomp can distinguish which job has
been finished.  BTW, which word to use here to avoid confusion? (task? job?)

I'm going to prototype something in libgomp using this interface.

  -- Ilya


Re: fix gomp offload routine unloading

2015-07-21 Thread Ilya Verbin
On Tue, Jul 21, 2015 at 08:15:41 -0400, Nathan Sidwell wrote:
> On 07/21/15 05:51, Jakub Jelinek wrote:
> >On Mon, Jul 20, 2015 at 07:08:55PM -0400, Nathan Sidwell wrote:
> >>2015-07-20  Nathan Sidwell  
> >>
> >>libgomp/
> >>* target.c (gomp_offload_image_to_device): Rename to ...
> >>(gomp_load_image_to_device): ... here.
> >>(GOMP_offload_register): Adjust call.
> >>(gomp_init_device): Likewise.
> >>(gomp_unload_image_from_device): New.  Broken out of ...
> >>(GOMP_offload_unregister): ... here.  Call it.
> >>(gomp_unload_device): New.
> >>* libgomp.h (gomp_unload_device): Declare.
> >>* oacc-init.c (acc_shutdown_1): Unload from device before deleting
> >>mem maps.
> >>
> >>gcc/
> >>* config/nvptx/mkoffload.c (process): Add destructor call.
> >
> >Ok if also tested on Intel MIC, with a few changes:
> 
> Ilya, are you able to test Intel MIC for me?

I don't see any regressions on MIC.

  -- Ilya


Re: [PATCH 3/4] Add libgomp plugin for Intel MIC

2015-07-23 Thread Ilya Verbin
On Wed, Jul 08, 2015 at 16:16:44 +0200, Thomas Schwinge wrote:
> > --- /dev/null
> > +++ b/liboffloadmic/plugin/Makefile.am
> > @@ -0,0 +1,123 @@
> > +# Plugin for offload execution on Intel MIC devices.
> 
> > +main_target_image.h: offload_target_main
> > +   @echo -n "const int image_size = " > $@
> > +   @stat -c '%s' $< >> $@
> > +   @echo ";" >> $@
> > +   @echo "struct MainTargetImage {" >> $@
> > +   @echo "  int64_t size;" >> $@
> > +   @echo "  char name[sizeof \"offload_target_main\"];" >> $@
> > +   @echo "  char data[image_size];" >> $@
> > +   @echo "};" >> $@
> > +   @echo "extern \"C\" const MainTargetImage main_target_image = {" >> $@
> > +   @echo "  image_size, \"offload_target_main\"," >> $@
> > +   @cat $< | xxd -include >> $@
> > +   @echo "};" >> $@
> > +
> > +offload_target_main: $(liboffload_dir)/ofldbegin.o offload_target_main.o 
> > $(liboffload_dir)/ofldend.o
> > +   $(CXX) $(AM_LDFLAGS) $^ -o $@
> > +
> > +offload_target_main.o: offload_target_main.cpp
> > +   $(CXX) $(AM_CXXFLAGS) $(AM_CPPFLAGS) -c $< -o $@
> 
> Here, I note that the xxd tool is being used, which in my distribution is
> part of the Vim editor's package, which -- as far as I know -- is not
> currently declared as a build dependency of GCC?

We have a patch, which checks for xxd availability, is it ok for trunk?


2015-07-23  Maxim Blumenthal  

* configure.ac: Add a check for xxd presence when the target is
intelmic or intelmicemul.
* configure: Regenerate.


diff --git a/configure b/configure
index 5ba9489..bd8fed8 100755
--- a/configure
+++ b/configure
[regenerate]

diff --git a/configure.ac b/configure.ac
index 2ff9be0..63eebfc 100644
--- a/configure.ac
+++ b/configure.ac
@@ -494,6 +494,17 @@ else
 fi])
 AC_SUBST(extra_liboffloadmic_configure_flags)
 
+# Check if xxd is present in the system
+# when the target is intelmic or intelmicemul.
+case "${target}" in
+  *-intelmic-* | *-intelmicemul-*)
+AC_CHECK_PROG(xxd_present, xxd, "yes", "no")
+if test "$xxd_present" = "no"; then
+  AC_MSG_ERROR([cannot find xxd])
+fi
+;;
+esac
+
 # Save it here so that, even in case of --enable-libgcj, if the Java
 # front-end isn't enabled, we still get libgcj disabled.
 libgcj_saved=$libgcj


  -- Ilya


Re: [gomp4] acc routines bugfix

2015-07-24 Thread Ilya Verbin
On Fri, Jul 24, 2015 at 08:05:00 -0700, Cesar Philippidis wrote:
> The second point is interesting. Offloaded functions require the "omp
> target" attribute or that function won't reach the lto compiler. That's
> fine because not all targets can handle general code. The problem occurs
> when a user forgets to bless a function as offloaded, which OpenACC
> allows. This patch teaches the lto-wrapper to error on unrecognized
> functions with flag_openacc or hit gcc_unreachable otherwise. I couldn't
> think of a way to test the lto error message because that involves
> having two compilers present. I wonder if it's ok to have libgomp check
> for compiler expected compiler errors? However, that's more of a
> gcc/testsuite type of check.
> 
> I don't think trunk has much support for acc routines just yet, so I
> applied this patch to gomp-4_0-branch for now.

OpenMP has similar issue.

> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> index 97585c9..bc589bd 100644
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -1219,9 +1219,23 @@ input_overwrite_node (struct lto_file_decl_data 
> *file_data,
>LDPR_NUM_KNOWN);
>node->instrumentation_clone = bp_unpack_value (bp, 1);
>node->split_part = bp_unpack_value (bp, 1);
> -  gcc_assert (flag_ltrans
> -   || (!node->in_other_partition
> -   && !node->used_from_other_partition));
> +
> +  int success = flag_ltrans || (!node->in_other_partition
> + && !node->used_from_other_partition);
> +
> +  if (!success)
> +{
> +  if (flag_openacc)
> + {
> +   if (TREE_CODE (node->decl) == FUNCTION_DECL)
> + error ("Missing routine function %<%s%>", node->name ());
> +   else
> + error ("Missing declared variable %<%s%>", node->name ());
> + }
> +
> +  else
> + gcc_unreachable ();
> +}
>  }

This will print an error not only when a fn/var, referenced from offload region,
missed its attribute, but also when something goes wrong in general LTO
partitioning (if flag_openacc is set).  So, maybe just replace gcc_assert ()
with error () without checking for flag_openacc?

And how about similar assert in input_varpool_node?

  -- Ilya


Re: [gomp4] acc routines bugfix

2015-07-24 Thread Ilya Verbin
On Fri, Jul 24, 2015 at 17:24:55 +0200, Jakub Jelinek wrote:
> On Fri, Jul 24, 2015 at 06:21:34PM +0300, Ilya Verbin wrote:
> > On Fri, Jul 24, 2015 at 08:05:00 -0700, Cesar Philippidis wrote:
> > > The second point is interesting. Offloaded functions require the "omp
> > > target" attribute or that function won't reach the lto compiler. That's
> > > fine because not all targets can handle general code. The problem occurs
> > > when a user forgets to bless a function as offloaded, which OpenACC
> > > allows. This patch teaches the lto-wrapper to error on unrecognized
> > > functions with flag_openacc or hit gcc_unreachable otherwise. I couldn't
> > > think of a way to test the lto error message because that involves
> > > having two compilers present. I wonder if it's ok to have libgomp check
> > > for compiler expected compiler errors? However, that's more of a
> > > gcc/testsuite type of check.
> > > 
> > > I don't think trunk has much support for acc routines just yet, so I
> > > applied this patch to gomp-4_0-branch for now.
> > 
> > OpenMP has similar issue.
> 
> Well, only for variables.  For functions the spec does not require anything
> like that, you can supply the functions in some other way.
> Generally, e.g. libc or libm functions aren't all #pragma omp declare target
> marked, yet they are usually allowed.

Library functions have node->in_other_partition = 0 and
node->used_from_other_partition = 0, I've just tried fprintf in a target region.
Only local functions without "declare target", but referenced from another
function with "declare target" get incorrect values here.

  -- Ilya


Re: offload data version number

2015-07-24 Thread Ilya Verbin
On Fri, Jul 24, 2015 at 18:30:16 +0200, Jakub Jelinek wrote:
> On Fri, Jul 24, 2015 at 09:32:04AM -0400, Nathan Sidwell wrote:
> > On 07/21/15 11:21, Nathan Sidwell wrote:
> > >On 07/21/15 09:25, Nathan Sidwell wrote:
> > >>This trunk patch implements new register and unregister entry points to 
> > >>allow
> > >>specifying data version information.  (I'll shortly be posting patches 
> > >>changing
> > >>the PTX offload data format.)
> > >>
> > >>We now have GOMP_offload_{,un}register_2, which take an additional 
> > >>unsigned int
> > >>version number.  The version number is composed of two parts.  16 bits 
> > >>for the
> > >>libgomp version and 16 bits for the device-specific plugin.  Currently 
> > >>both are
> > >>zero.  When the PTX data changes, that device-specific value will 
> > >>increment.
> > >>
> > >>The existing register/unregister calls forward to the new ones, providing 
> > >>zero
> > >>for the version information.
> > >>
> > >>On the plugin side I've added 2 new entry points 
> > >>GOMP_OFFLOAD_{,un}load_image_2,
> > >>which also take an additional version number argument.  These entry 
> > >>points are
> > >>optional, and only added to the PTX plugin.  The existing plugin 
> > >>entrypoints
> > >>forward to the new ones.
> > >>
> > >>libgomp  will use these new entry points if they exist, otherwise use the
> > >>original entry points, provided the incoming version is zero.
> > >>
> > >>I added the GOMP_offload_{,un}register_2 routines to the libgomp map file 
> > >>as
> > >>version 4.0.2 -- I wasn't sure whether to increment it more than that. 
> > >>Advice
> > >>sought.
> > >
> > >this version is updated following committing the unload patch.  there were 
> > >a few
> > >(expected) collisions.
> > 
> > I committed a version to gomp4 branch, but would still like to get this to
> > trunk ASAP.
> 
> So there is no version anywhere?  I remember in the design ideas the plan
> was that the data section containing the target info (that originally has
> been meant to be passed as GOMP_target parameter, but later on has been
> changed to the register/unregister approach) will contain some header that
> would include version number, some flags and details on the payload.
> Do you mean that right now the data section (or pointer passed to the
> register functions) only contains the raw bits (ELF DSO for Intel MIC and
> PTX text files for NVPTX), rather than some header?
> How do you determine the size of the bits?

Yes, currently there is no version in target info, which is passed to register
function.  In case of MIC, this header contains only 2 fields: start and end of
the target image.

  -- Ilya


Re: offload data version number

2015-07-24 Thread Ilya Verbin
On Fri, Jul 24, 2015 at 15:26:38 -0400, Nathan Sidwell wrote:
> this version makes the following changes to the earlier version.
> 
> *) Renames things to FOO_ver, rather than FOO_2
> 
> *) No attempt to deal with cross-version plugins and libgomp.
> 
> *) Adds GOMP_OFFLOAD_version function to plugin. (I went with your
> approach). Returns the GOMP_VERSION used to build the plugin, which
> libgomp checks matches the value for its build.  When we make
> incompatible changes to the plugin interface, that value can be
> incremented.
> 
> *) While working on gomp_load_plugin_for_device, I noticed the DLSYM
> and DLSYM_OPT macros were somewhat funky.  We're loading functions,
> so don't expect a NULL value.  We can simply check the returned
> value and only need dlerror when we get NULL.  The counting that
> DLSYM_OPT does was somewhat funky too.  IMHO better for that macro
> to simply return a truth value.

I do not know whether this is a good idea, but it's possible to add some magic
number into mkoffload:process () like:

865   fprintf (out, "static const void *target_data[] = {\n");
866   fprintf (out, "  MAGIC, VERSION, ptx_code, (void*) %u, var_mappings, 
(void*) %u, "
867 "func_mappings\n", nvars, nfuncs);
868   fprintf (out, "};\n\n");

So, libgomp will be able to check target_data in GOMP_offload_register.
If MAGIC is present, it can check the VERSION, the plugin also can check the
version in a similar way.  This hack allows to avoid new versions of
GOMP_*_ver in libgomp and GOMP_OFFLOAD_*_ver in plugins.

  -- Ilya


Re: [gomp4.1] Support #pragma omp target {enter,exit} data

2015-07-29 Thread Ilya Verbin
On Mon, Jul 06, 2015 at 22:42:10 +0200, Jakub Jelinek wrote:
> As has been clarified on omp-lang, we actually shouldn't be mapping or
> unmapping the pointer and/or reference, only the array slice itself, except
> in target construct (and even for that it is changing from mapping to
> private + pointer assignment).

I've updated this patch.  make check-target-libgomp passed.


libgomp/
* target.c (gomp_map_vars_existing): Fix target address for 'always to'
array sections.
(gomp_unmap_vars): Decrement k->refcount when it is 1 and
k->async_refcount is 0.
(gomp_offload_image_to_device): Set tgt's refcount to infinity.
(gomp_exit_data): New static function.
(GOMP_target_enter_exit_data): Support mapping/unmapping.
* testsuite/libgomp.c/target-11.c: Extend for testing 'always to' array
sections.
* testsuite/libgomp.c/target-20.c: New test.


diff --git a/libgomp/target.c b/libgomp/target.c
index ef74d43..ad375c9 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -191,7 +191,8 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep, 
splay_tree_key oldn,
 
   if (GOMP_MAP_ALWAYS_TO_P (kind))
 devicep->host2dev_func (devicep->target_id,
-   (void *) (oldn->tgt->tgt_start + oldn->tgt_offset),
+   (void *) (oldn->tgt->tgt_start + oldn->tgt_offset
+ + newn->host_start - oldn->host_start),
(void *) newn->host_start,
newn->host_end - newn->host_start);
   if (oldn->refcount != REFCOUNT_INFINITY)
@@ -664,15 +665,18 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool 
do_copyfrom)
continue;
 
   bool do_unmap = false;
-  if (k->refcount > 1)
+  if (k->refcount > 1 && k->refcount != REFCOUNT_INFINITY)
+   k->refcount--;
+  else if (k->refcount == 1)
{
- if (k->refcount != REFCOUNT_INFINITY)
-   k->refcount--;
+ if (k->async_refcount > 0)
+   k->async_refcount--;
+ else
+   {
+ k->refcount--;
+ do_unmap = true;
+   }
}
-  else if (k->async_refcount > 0)
-   k->async_refcount--;
-  else
-   do_unmap = true;
 
   if ((do_unmap && do_copyfrom && tgt->list[i].copy_from)
  || tgt->list[i].always_copy_from)
@@ -798,7 +802,7 @@ gomp_offload_image_to_device (struct gomp_device_descr 
*devicep,
   /* Insert host-target address mapping into splay tree.  */
   struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
   tgt->array = gomp_malloc ((num_funcs + num_vars) * sizeof (*tgt->array));
-  tgt->refcount = 1;
+  tgt->refcount = REFCOUNT_INFINITY;
   tgt->tgt_start = 0;
   tgt->tgt_end = 0;
   tgt->to_free = NULL;
@@ -1241,6 +1245,62 @@ GOMP_target_update (int device, const void *unused, 
size_t mapnum,
   gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
 }
 
+static void
+gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
+   void **hostaddrs, size_t *sizes, unsigned short *kinds)
+{
+  const int typemask = 0xff;
+  size_t i;
+  gomp_mutex_lock (&devicep->lock);
+  for (i = 0; i < mapnum; i++)
+{
+  struct splay_tree_key_s cur_node;
+  unsigned char kind = kinds[i] & typemask;
+  switch (kind)
+   {
+   case GOMP_MAP_FROM:
+   case GOMP_MAP_ALWAYS_FROM:
+   case GOMP_MAP_DELETE:
+   case GOMP_MAP_RELEASE:
+ cur_node.host_start = (uintptr_t) hostaddrs[i];
+ cur_node.host_end = cur_node.host_start + sizes[i];
+ splay_tree_key k = splay_tree_lookup (&devicep->mem_map, &cur_node);
+ if (!k)
+   continue;
+
+ if (k->refcount > 0 && k->refcount != REFCOUNT_INFINITY)
+   k->refcount--;
+ if (kind == GOMP_MAP_DELETE && k->refcount != REFCOUNT_INFINITY)
+   k->refcount = 0;
+
+ if ((kind == GOMP_MAP_FROM && k->refcount == 0)
+ || kind == GOMP_MAP_ALWAYS_FROM)
+   devicep->dev2host_func (devicep->target_id,
+   (void *) cur_node.host_start,
+   (void *) (k->tgt->tgt_start + k->tgt_offset
+ + cur_node.host_start
+ - k->host_start),
+   cur_node.host_end - cur_node.host_start);
+ if (k->refcount == 0)
+   {
+ splay_tree_remove (&devicep->mem_map, k);
+ if (k->tgt->refcount > 1)
+   k->tgt->refcount--;
+ else
+   gomp_unmap_tgt (k->tgt);
+   }
+
+ break;
+   default:
+ gomp_mutex_unlock (&devicep->lock);
+ gomp_fatal ("GOMP_target_enter_exit_data unhandled kind 0x%.2x",
+ kind);
+   }
+}
+
+  gomp_mutex_unlock (&devicep->lock);
+}
+
 void
 GOMP_target_enter_exit_data 

Re: [gomp4.1] Support #pragma omp target {enter,exit} data

2015-07-30 Thread Ilya Verbin
On Thu, Jul 30, 2015 at 10:12:59 +0200, Jakub Jelinek wrote:
> This test will fail on HSA, you don't assume just that it doesn't
> fallback to host, but also non-shared address space.
> I think it would be better to start with some check for non-shared address
> space, like:
> /* This test relies on non-shared address space.  Punt otherwise.  */
> void ensure_nonshared_as (void)
> {
>   int a = 8;
>   #pragma omp target map(to:a)
>   {
> a++;
>   }
>   if (a == 8)
> exit (0);
> }
> 
> And generally, it is better to have most of the tests not relying on
> offloading only or even non-shared address space, so that we also test
> shared address space and host fallback.  But a few tests won't hurt...

Sure, but it's not possible to fully test data mapping without non-shared
address space.  I've created new check_effective_target, ok for gomp-4_1-branch?


* testsuite/lib/libgomp.exp
(check_effective_target_offload_device_nonshared_as): New.
* testsuite/libgomp.c++/examples-4/e.53.2.C: Replace offload_device with
offload_device_nonshared_as.
* testsuite/libgomp.c/target-11.c: Ditto.


diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index 438777f..3a29b78 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -320,6 +320,19 @@ proc check_effective_target_offload_device { } {
 } ]
 }
 
+# Return 1 if offload device is available and it has non-shared address space.
+proc check_effective_target_offload_device_nonshared_as { } {
+return [check_runtime_nocache offload_device_nonshared_as {
+  int main ()
+   {
+ int a = 8;
+ #pragma omp target map(to: a)
+   a++;
+ return a != 8;
+   }
+} ]
+}
+
 # Return 1 if at least one nvidia board is present.
 
 proc check_effective_target_openacc_nvidia_accel_present { } {
diff --git a/libgomp/testsuite/libgomp.c++/examples-4/e.53.2.C 
b/libgomp/testsuite/libgomp.c++/examples-4/e.53.2.C
index 75276e7..6d5b5e4 100644
--- a/libgomp/testsuite/libgomp.c++/examples-4/e.53.2.C
+++ b/libgomp/testsuite/libgomp.c++/examples-4/e.53.2.C
@@ -1,5 +1,5 @@
 // { dg-do run }
-// { dg-require-effective-target offload_device }
+// { dg-require-effective-target offload_device_nonshared_as }
 
 #include 
 
diff --git a/libgomp/testsuite/libgomp.c/target-11.c 
b/libgomp/testsuite/libgomp.c/target-11.c
index b86097a..ed6a17a 100644
--- a/libgomp/testsuite/libgomp.c/target-11.c
+++ b/libgomp/testsuite/libgomp.c/target-11.c
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 


  -- Ilya


Re: [gomp4.1] Support #pragma omp target {enter,exit} data

2015-07-30 Thread Ilya Verbin
On Thu, Jul 30, 2015 at 10:12:59 +0200, Jakub Jelinek wrote:
> On Wed, Jul 29, 2015 at 10:06:52PM +0300, Ilya Verbin wrote:
> > @@ -1241,6 +1245,62 @@ GOMP_target_update (int device, const void *unused, 
> > size_t mapnum,
> >gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
> >  }
> >  
> > +static void
> > +gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
> > +   void **hostaddrs, size_t *sizes, unsigned short *kinds)
> > +{
> > +  const int typemask = 0xff;
> > +  size_t i;
> > +  gomp_mutex_lock (&devicep->lock);
> > +  for (i = 0; i < mapnum; i++)
> > +{
> > +  struct splay_tree_key_s cur_node;
> > +  unsigned char kind = kinds[i] & typemask;
> > +  switch (kind)
> > +   {
> > +   case GOMP_MAP_FROM:
> > +   case GOMP_MAP_ALWAYS_FROM:
> > +   case GOMP_MAP_DELETE:
> > +   case GOMP_MAP_RELEASE:
> 
> Please handle here GOMP_MAP_ZERO_LEN_ARRAY_SECTION too.
> It should use gomp_map_lookup (while all others splay_tree_lookup),
> otherwise it is the same as GOMP_MAP_RELEASE.

Done.

> > @@ -1280,13 +1337,20 @@ GOMP_target_enter_exit_data (int device, size_t 
> > mapnum, void **hostaddrs,
> >  }
> >  
> >if (is_enter_data)
> > -{
> > -  /* TODO  */
> > -}
> > +for (i = 0; i < mapnum; i++)
> > +  {
> > +   struct target_mem_desc *tgt_var
> > + = gomp_map_vars (devicep, 1, &hostaddrs[i], NULL, &sizes[i],
> > +  &kinds[i], true, false);
> > +   tgt_var->refcount--;
> > +
> > +   /* If the variable was already mapped, tgt_var is not needed.  Otherwise
> > +  tgt_var will be freed by gomp_unmap_vars or gomp_exit_data.  */
> > +   if (tgt_var->refcount == 0)
> > + free (tgt_var);
> 
> This is racy, you don't hold the device lock here anymore, so you shouldn't
> decrease refcounts or test it etc.
> I think better would be to change the bool is_target argument to
> gomp_map_vars into an enum, and use 3 values there for now
> - GOMP_VARS_MAP_TARGET, GOMP_VARS_MAP_DATA, GOMP_VARS_MAP_ENTER_DATA or so,
> and for GOMP_VARS_MAP_ENTER_DATA perform the decrement of refcount and
> freeing if it is zero (but then also better return NULL).

Fixed.

> > diff --git a/libgomp/testsuite/libgomp.c/target-20.c 
> > b/libgomp/testsuite/libgomp.c/target-20.c
> > new file mode 100644
> > index 000..ec7e245
> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.c/target-20.c
> > @@ -0,0 +1,111 @@
> > +/* { dg-require-effective-target offload_device } */
> 
> This test will fail on HSA, you don't assume just that it doesn't
> fallback to host, but also non-shared address space.

Fixed.

make check-target-libgomp passed.  ok?


libgomp/
* libgomp.h (enum gomp_map_vars_kind): New.
(gomp_map_vars): Change type of the argument from bool to enum
gomp_map_vars_kind.
* oacc-mem.c (acc_map_data, present_create_copy,
gomp_acc_insert_pointer): Pass GOMP_MAP_VARS_OPENACC instead of false to
gomp_map_vars.
* oacc-parallel.c (GOACC_parallel, GOACC_data_start): Likewise.
* target.c (gomp_map_vars_existing): Fix target address for 'always to'
array sections.
(gomp_map_vars): Change type of the argument from bool to enum
gomp_map_vars_kind, fixup its usage.  Set tgt->refcount to 0 if called
from GOMP_target_enter_exit_data.  Free tgt if called from
GOMP_target_enter_exit_data and nothing has been mapped.
(gomp_unmap_vars): Decrement k->refcount when it is 1 and
k->async_refcount is 0.
(gomp_offload_image_to_device): Set tgt's refcount to infinity.
(GOMP_target, GOMP_target_41): Pass GOMP_MAP_VARS_TARGET instead of true
to gomp_map_vars.
(gomp_target_data_fallback, GOMP_target_data, GOMP_target_data_41): Pass
GOMP_MAP_VARS_DATA instead of false to gomp_map_vars.
(gomp_exit_data): New static function.
(GOMP_target_enter_exit_data): Support mapping/unmapping.
* testsuite/libgomp.c/target-11.c: Extend for testing 'always to' array
sections.
* testsuite/libgomp.c/target-20.c: New test.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 707acaf..9031649 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -787,12 +787,22 @@ struct gomp_device_descr
   acc_dispatch_t openacc;
 };
 
+/* Kind of the pragma, for which gomp_map_vars () is called.  */
+enum gomp_map_vars_kind
+{
+  GOMP_MAP_VARS_OPENACC,
+  GOMP_MAP_VARS_TARGET,
+  GOMP_MAP_VARS_DATA,
+  GOMP_MAP_VARS_ENTER_DATA
+};
+

Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming

2015-07-31 Thread Ilya Verbin
On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
> We had established the use of a boolean flag have_offload in gcc::context
> to indicate whether during compilation, we've actually seen any code to
> be offloaded (see cited below the relevant parts of the patch by Ilya et
> al.).  This means that currently, the whole offload machinery will not be
> run unless we actually have any offloaded data.  This means that the
> configured mkoffload programs (-foffload=[...], defaulting to
> configure-time --enable-offload-targets=[...]) will not be invoked unless
> we actually have any offloaded data.  This means that we will not
> actually generate constructor code to call libgomp's
> GOMP_offload_register unless we actually have any offloaded data.

Yes, that was the plan.

> runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
> targets have been specified during compilation.
> 
> But: at runtime, I'd like to know which -foffload=[...] targets have been
> specified during compilation, so that we can, for example, reliably
> resort to host fallback execution for -foffload=disable instead of
> getting error message that an offloaded function is missing.

It's easy to fix:

diff --git a/libgomp/target.c b/libgomp/target.c
index a5fb164..f81d570 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
*devicep,
   k.host_end = k.host_start + 1;
   splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
   gomp_mutex_unlock (&devicep->lock);
-  if (tgt_fn == NULL)
-   gomp_fatal ("Target function wasn't mapped");
-
   return (void *) tgt_fn->tgt_offset;
 }
 }
@@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void 
*unused,
 return gomp_target_fallback (fn, hostaddrs);
 
   void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
+  if (fn_addr == NULL)
+return gomp_target_fallback (fn, hostaddrs);
 
   struct target_mem_desc *tgt_vars
 = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
@@ -1155,6 +1154,8 @@ GOMP_target_41 (int device, void (*fn) (void *), size_t 
mapnum,
 }
 
   void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
+  if (fn_addr == NULL)
+return gomp_target_fallback (fn, hostaddrs);
 
   struct target_mem_desc *tgt_vars
 = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,


> other hand, for example, for -foffload=nvptx-none, even if user program
> code doesn't contain any offloaded data (and thus the offload machinery
> has not been run), the user program might still contain any executable
> directives or OpenACC runtime library calls, so we'd still like to use
> the libgomp nvptx plugin.  However, we currently cannot detect this
> situation.
> 
> I see two ways to resolve this: a) embed the compile-time -foffload=[...]
> configuration in the executable (as a string, for example) for libgomp to
> look that up, or b) make it a requirement that (if configured via
> -foffload=[...]), the offload machinery is run even if there is not
> actually any data to be offloaded, so we then reliably get the respective
> constructor call to libgomp's GOMP_offload_register.  I once began to
> implement a), but this to get a big ugly, so then looked into b) instead.
> Compared to the status quo, always running the whole offloading machinery
> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
> are active, certainly does introduce some overhead when there isn't
> actually any code to be offloaded, so I'm not sure whether that is
> acceptable?

I vote for (a).

  -- Ilya


Regression in target MIC compiler (was: nvptx offloading patches [3/n], RFD)

2015-07-31 Thread Ilya Verbin
Hi!

I've noticed that target MIC compiler from trunk hangs forever in
lto_input_mode_table in this loop, even on simple testcases.

On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
+  /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
+if not found, fallback to all modes.  */
+  int pass;
+  for (pass = 0; pass < 2; pass++)
+   for (machine_mode mr = pass ? VOIDmode
+   : GET_CLASS_NARROWEST_MODE (mclass);
+pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
+pass ? mr = (machine_mode) (m + 1)
+ : mr = GET_MODE_WIDER_MODE (mr))
+ if (GET_MODE_CLASS (mr) != mclass
+ || GET_MODE_SIZE (mr) != size
+ || GET_MODE_PRECISION (mr) != prec
+ || GET_MODE_INNER (mr) != inner
+ || GET_MODE_IBIT (mr) != ibit
+ || GET_MODE_FBIT (mr) != fbit
+ || GET_MODE_NUNITS (mr) != nunits)
+   continue;

Given that gomp-4_1-branch works ok, the problem was introduced somewhere
between 9 and 31 Jul.  I'll try to find the revision.

  -- Ilya


Re: Regression in target MIC compiler (was: nvptx offloading patches [3/n], RFD)

2015-07-31 Thread Ilya Verbin
On Fri, Jul 31, 2015 at 19:27:58 +0300, Ilya Verbin wrote:
> I've noticed that target MIC compiler from trunk hangs forever in
> lto_input_mode_table in this loop, even on simple testcases.
> 
> On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
> +  /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
> +  if not found, fallback to all modes.  */
> +  int pass;
> +  for (pass = 0; pass < 2; pass++)
> + for (machine_mode mr = pass ? VOIDmode
> + : GET_CLASS_NARROWEST_MODE (mclass);
> +  pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
> +  pass ? mr = (machine_mode) (m + 1)
> +   : mr = GET_MODE_WIDER_MODE (mr))
> +   if (GET_MODE_CLASS (mr) != mclass
> +   || GET_MODE_SIZE (mr) != size
> +   || GET_MODE_PRECISION (mr) != prec
> +   || GET_MODE_INNER (mr) != inner
> +   || GET_MODE_IBIT (mr) != ibit
> +   || GET_MODE_FBIT (mr) != fbit
> +   || GET_MODE_NUNITS (mr) != nunits)
> + continue;
> 
> Given that gomp-4_1-branch works ok, the problem was introduced somewhere
> between 9 and 31 Jul.  I'll try to find the revision.

Shouldn't 'mr' be here instead of 'm'?

> mr = (machine_mode) (m + 1)

  -- Ilya


Re: Regression in target MIC compiler (was: nvptx offloading patches [3/n], RFD)

2015-07-31 Thread Ilya Verbin
On Fri, Jul 31, 2015 at 18:59:59 +0200, Jakub Jelinek wrote:
> On Fri, Jul 31, 2015 at 07:53:16PM +0300, Ilya Verbin wrote:
> > On Fri, Jul 31, 2015 at 19:27:58 +0300, Ilya Verbin wrote:
> > > I've noticed that target MIC compiler from trunk hangs forever in
> > > lto_input_mode_table in this loop, even on simple testcases.
> > > 
> > > On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
> > > +  /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
> > > +  if not found, fallback to all modes.  */
> > > +  int pass;
> > > +  for (pass = 0; pass < 2; pass++)
> > > + for (machine_mode mr = pass ? VOIDmode
> > > + : GET_CLASS_NARROWEST_MODE (mclass);
> > > +  pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
> > > +  pass ? mr = (machine_mode) (m + 1)
> > > +   : mr = GET_MODE_WIDER_MODE (mr))
> > > +   if (GET_MODE_CLASS (mr) != mclass
> > > +   || GET_MODE_SIZE (mr) != size
> > > +   || GET_MODE_PRECISION (mr) != prec
> > > +   || GET_MODE_INNER (mr) != inner
> > > +   || GET_MODE_IBIT (mr) != ibit
> > > +   || GET_MODE_FBIT (mr) != fbit
> > > +   || GET_MODE_NUNITS (mr) != nunits)
> > > + continue;
> > > 
> > > Given that gomp-4_1-branch works ok, the problem was introduced somewhere
> > > between 9 and 31 Jul.  I'll try to find the revision.
> > 
> > Shouldn't 'mr' be here instead of 'm'?
> 
> I think so.  If it works, patch preapproved.

It fixes the infinite loop, but causes an error:
lto1: fatal error: unsupported mode QI

> But wonder what changed that we haven't been triggering it before.
> What mode do you think it on (mclass/size/prec/inner/ibit/fbit/nunits)?

When in hangs, mr is HImode.

  -- Ilya


Re: Regression in target MIC compiler

2015-08-04 Thread Ilya Verbin
On Tue, Aug 04, 2015 at 14:35:11 +0200, Thomas Schwinge wrote:
> On Fri, 31 Jul 2015 20:13:02 +0300, Ilya Verbin  wrote:
> > On Fri, Jul 31, 2015 at 18:59:59 +0200, Jakub Jelinek wrote:
> > > > > On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
> > > > > +  /* First search just the GET_CLASS_NARROWEST_MODE to wider 
> > > > > modes,
> > > > > +  if not found, fallback to all modes.  */
> > > > > +  int pass;
> > > > > +  for (pass = 0; pass < 2; pass++)
> > > > > + for (machine_mode mr = pass ? VOIDmode
> > > > > + : GET_CLASS_NARROWEST_MODE (mclass);
> > > > > +  pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
> > > > > +  pass ? mr = (machine_mode) (m + 1)
> > > > > +   : mr = GET_MODE_WIDER_MODE (mr))
> > > > > +   if (GET_MODE_CLASS (mr) != mclass
> > > > > +   || GET_MODE_SIZE (mr) != size
> > > > > +   || GET_MODE_PRECISION (mr) != prec
> > > > > +   || GET_MODE_INNER (mr) != inner
> > > > > +   || GET_MODE_IBIT (mr) != ibit
> > > > > +   || GET_MODE_FBIT (mr) != fbit
> > > > > +   || GET_MODE_NUNITS (mr) != nunits)
> > > > > + continue;
> > > > > 
> > > > > Given that gomp-4_1-branch works ok, the problem was introduced 
> > > > > somewhere
> > > > > between 9 and 31 Jul.  I'll try to find the revision.
> > > > 
> > > > Shouldn't 'mr' be here instead of 'm'?
> > > 
> > > I think so.  If it works, patch preapproved.
> > 
> > It fixes the infinite loop, but causes an error:
> > lto1: fatal error: unsupported mode QI
> 
> Confirmed.
> 
> > > But wonder what changed that we haven't been triggering it before.
> > > What mode do you think it on (mclass/size/prec/inner/ibit/fbit/nunits)?
> > 
> > When in hangs, mr is HImode.
> 
> Do you already have any further analysis, a workaround, or even a fix?

Not yet.  I thought since Jakub is the author of this function, he could easily
point what is wrong here :)  Actually, intelmic doesn't require
lto_input_mode_table, so temporary workaround is just to disable it.

  -- Ilya


Re: Regression in target MIC compiler

2015-08-04 Thread Ilya Verbin
On Tue, Aug 04, 2015 at 16:07:42 +0200, Richard Biener wrote:
> On Tue, Aug 4, 2015 at 3:06 PM, Ilya Verbin  wrote:
> > On Tue, Aug 04, 2015 at 14:35:11 +0200, Thomas Schwinge wrote:
> >> On Fri, 31 Jul 2015 20:13:02 +0300, Ilya Verbin  wrote:
> >> > On Fri, Jul 31, 2015 at 18:59:59 +0200, Jakub Jelinek wrote:
> >> > > > > On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
> >> > > > > +  /* First search just the GET_CLASS_NARROWEST_MODE to wider 
> >> > > > > modes,
> >> > > > > +  if not found, fallback to all modes.  */
> >> > > > > +  int pass;
> >> > > > > +  for (pass = 0; pass < 2; pass++)
> >> > > > > + for (machine_mode mr = pass ? VOIDmode
> >> > > > > + : GET_CLASS_NARROWEST_MODE 
> >> > > > > (mclass);
> >> > > > > +  pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
> >> > > > > +  pass ? mr = (machine_mode) (m + 1)
> >> > > > > +   : mr = GET_MODE_WIDER_MODE (mr))
> >> > > > > +   if (GET_MODE_CLASS (mr) != mclass
> >> > > > > +   || GET_MODE_SIZE (mr) != size
> >> > > > > +   || GET_MODE_PRECISION (mr) != prec
> >> > > > > +   || GET_MODE_INNER (mr) != inner
> >> > > > > +   || GET_MODE_IBIT (mr) != ibit
> >> > > > > +   || GET_MODE_FBIT (mr) != fbit
> >> > > > > +   || GET_MODE_NUNITS (mr) != nunits)
> >> > > > > + continue;
> >> > > > >
> >> > > > > Given that gomp-4_1-branch works ok, the problem was introduced 
> >> > > > > somewhere
> >> > > > > between 9 and 31 Jul.  I'll try to find the revision.
> >> > > >
> >> > > > Shouldn't 'mr' be here instead of 'm'?
> >> > >
> >> > > I think so.  If it works, patch preapproved.
> 
> ^^^
> 
> looks like an obvious error anyway.
> 
> Richard.

Yeah, but the fix for this typo doesn't really help, since it exposes another
error in this function.

vvv

> >> > It fixes the infinite loop, but causes an error:
> >> > lto1: fatal error: unsupported mode QI
> >>
> >> Confirmed.
> >>
> >> > > But wonder what changed that we haven't been triggering it before.
> >> > > What mode do you think it on (mclass/size/prec/inner/ibit/fbit/nunits)?
> >> >
> >> > When in hangs, mr is HImode.
> >>
> >> Do you already have any further analysis, a workaround, or even a fix?
> >
> > Not yet.  I thought since Jakub is the author of this function, he could 
> > easily
> > point what is wrong here :)  Actually, intelmic doesn't require
> > lto_input_mode_table, so temporary workaround is just to disable it.

  -- Ilya


Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming

2015-08-05 Thread Ilya Verbin
On Wed, Aug 05, 2015 at 10:40:44 +0200, Richard Biener wrote:
> On Fri, Jul 31, 2015 at 4:20 PM, Ilya Verbin  wrote:
> > On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
> >> We had established the use of a boolean flag have_offload in gcc::context
> >> to indicate whether during compilation, we've actually seen any code to
> >> be offloaded (see cited below the relevant parts of the patch by Ilya et
> >> al.).  This means that currently, the whole offload machinery will not be
> >> run unless we actually have any offloaded data.  This means that the
> >> configured mkoffload programs (-foffload=[...], defaulting to
> >> configure-time --enable-offload-targets=[...]) will not be invoked unless
> >> we actually have any offloaded data.  This means that we will not
> >> actually generate constructor code to call libgomp's
> >> GOMP_offload_register unless we actually have any offloaded data.
> >
> > Yes, that was the plan.
> >
> >> runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
> >> targets have been specified during compilation.
> >>
> >> But: at runtime, I'd like to know which -foffload=[...] targets have been
> >> specified during compilation, so that we can, for example, reliably
> >> resort to host fallback execution for -foffload=disable instead of
> >> getting error message that an offloaded function is missing.
> >
> > It's easy to fix:
> >
> > diff --git a/libgomp/target.c b/libgomp/target.c
> > index a5fb164..f81d570 100644
> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
> > *devicep,
> >k.host_end = k.host_start + 1;
> >splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
> >gomp_mutex_unlock (&devicep->lock);
> > -  if (tgt_fn == NULL)
> > -   gomp_fatal ("Target function wasn't mapped");
> > -
> >return (void *) tgt_fn->tgt_offset;
> >  }
> >  }
> > @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const 
> > void *unused,
> >  return gomp_target_fallback (fn, hostaddrs);
> >
> >void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> > +  if (fn_addr == NULL)
> > +return gomp_target_fallback (fn, hostaddrs);
> >
> >struct target_mem_desc *tgt_vars
> >  = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
> > @@ -1155,6 +1154,8 @@ GOMP_target_41 (int device, void (*fn) (void *), 
> > size_t mapnum,
> >  }
> >
> >void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> > +  if (fn_addr == NULL)
> > +return gomp_target_fallback (fn, hostaddrs);
> >
> >struct target_mem_desc *tgt_vars
> >  = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
> >
> >
> >> other hand, for example, for -foffload=nvptx-none, even if user program
> >> code doesn't contain any offloaded data (and thus the offload machinery
> >> has not been run), the user program might still contain any executable
> >> directives or OpenACC runtime library calls, so we'd still like to use
> >> the libgomp nvptx plugin.  However, we currently cannot detect this
> >> situation.
> >>
> >> I see two ways to resolve this: a) embed the compile-time -foffload=[...]
> >> configuration in the executable (as a string, for example) for libgomp to
> >> look that up, or b) make it a requirement that (if configured via
> >> -foffload=[...]), the offload machinery is run even if there is not
> >> actually any data to be offloaded, so we then reliably get the respective
> >> constructor call to libgomp's GOMP_offload_register.  I once began to
> >> implement a), but this to get a big ugly, so then looked into b) instead.
> >> Compared to the status quo, always running the whole offloading machinery
> >> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
> >> are active, certainly does introduce some overhead when there isn't
> >> actually any code to be offloaded, so I'm not sure whether that is
> >> acceptable?
> >
> > I vote for (a).
> 
> What happens for conflicting -fofffload=[...] options in different TUs?

If you're asking about what happens now, only the list of offload targets from
link-time -foffload=tgt1,tgt2 option matters.

I don't like plan (b) because it calls ipa_write_summaries unconditionally for
all OpenMP programs, which creates IR sections, which increases filesize and may
cause other problems, e.g. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63868>.
Also compile-time is increased because of LTO machinery, mkoffloads, etc.

If OpenACC requires some registration in libgomp even without offload, maybe you
can run this machinery only under flag_openacc?

  -- Ilya


Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2015-09-21 Thread Ilya Verbin
2015-09-21 18:15 GMT+03:00 Thomas Schwinge :
> (, "--foffload* undocumented", has recently
> been filed.)
>
> (In the following, "intelmic" is short for
> "x86_64-intelmicemul-linux-gnu", and "nvptx" is short for "nvptx-none".)
>
> What is the syntax to use for building both intelmic and nvptx offloading
> code?  I understand we allow for separate -foffload=intelmic
> -foffload=nvptx options.  Do we also intend to allow
> -foffload=intelmic,nvptx or -foffload=intelmic:nvptx?
>
> And then, we allow for specifying offloading compiler options with
> -foffload=intelmic=[...] and -foffload=nvptx=[...]; do we also intend to
> allow -foffload=intelmic,nvptx=[...] (do the options apply to nvptx only,
> or to both intelmic and nvptx?), and/or
> -foffload=intelmic=[...],nvptx=[...], and/or
> -foffload=intelmic:nvptx=[...] (which "looks a bit like" the options
> ought to apply to nvptx only -- or to both intelmic and nvptx?), and/or
> -foffload=intelmic=[...]:nvptx=[...]?

The plan was:

1. -foffload=intelmic,nvptx=[...]  <- apply options to both intelmic,nvptx.
   Just like -foffload=[...] applies to both targets (if configured so).
2. -foffload=intelmic=[...],nvptx=[...]  <- is not allowed.
3. To apply different options to different targets, one should pass:
   -foffload=intelmic=[...] -foffload=nvptx=[...].

>   3612/* Check that GCC is configured to support the offload 
> target.  */
>   3613c = OFFLOAD_TARGETS;
>   3614while (c)
>   3615  {
>   3616n = strchr (c, ',');
>   3617if (n == NULL)
>   3618  n = strchr (c, '\0');
>   3619
>   3620if (next - cur == n - c && strncmp (target, c, n - c) 
> == 0)
>   3621  break;
>   3622
>   3623c = *n ? n + 1 : NULL;
>   3624  }
>   3625
>   3626if (!c)
>   3627  fatal_error (input_location,
>   3628   "GCC is not configured to support %s as 
> offload target",
>   3629   target);
>
> So, this code will not do the right thing when configured with
> --enable-offload-targets=intelmic,nvptx (thus,
> OFFLOAD_TARGETS=intelmic:nvptx): using -foffload=nvptx will then result
> in "xgcc: fatal error: GCC is not configured to support nvptx as offload
> target".
>
> If I'm understanding the following code correctly, this supports the idea
> that the intention has been for -foffload=[targets]=[options] to separate
> the targets by commas, and separate the options by spaces -- is that
> correct?

Yes, targets are separated by commas, options are the whole string after the
equal sign, spaces inside are allowed.

  -- Ilya


Re: [PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit

2015-09-24 Thread Ilya Verbin
On Thu, Aug 27, 2015 at 21:44:50 +0800, Chung-Lin Tang wrote:
> We've discovered that, for several of the libgomp plugin interface routines,
> if the target specific routine calls exit() (usually upon a fatal condition),
> deadlock ensues. We found this using nvptx, but it's possible on intelmic as 
> well.
> 
> This is due to many of the plugin routines are called with the device lock 
> held,
> and when exit() is called inside the plugin code, the GOMP_unregister_var() 
> destructor
> tries to iterate through and acquire all device locks to cleanup. Since we 
> already hold
> one of the device locks, this just gets stuck.  Also because gomp_mutex_t is a
> simple futex based lock implementation (instead of pthreads), we don't have a
> trylock mechanism to use either.
> 
> So this patch tries to alleviate this problem by changing the plugin 
> interface;
> the plugin routines that are called while holding the device lock are adjusted
> to assume to never fatal exit, but return a value back to libgomp proper to
> indicate execution results. The core libgomp code then may unlock and call 
> gomp_fatal().
> 
> We believe this is the right route to solve the problem, since there's only
> two accel target plugins so far. Besides the nvptx plugin, I have made some 
> effort
> to update the intelmic plugin as well, though it's not as thoroughly audited.
> Intel folks might want to further make sure your plugin code is free of this 
> problem as well.
> 
> This patch contains the libgomp proper changes. The nvptx and intelmic 
> patches follow.
> I have tested the libgomp testsuite without regressions for both accel 
> targets, is this
> okay for trunk?

(I have no objections)

However, in case of intelmic, these exit()s are just the tip of the iceberg,
because underlying liboffloadmic contains other exit()s at fatal errors.
And I don't know what to do with such deadlocks.

  -- Ilya


Re: libgomp: Guard all offload_images/num_offload_images access by register_lock (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)

2015-09-25 Thread Ilya Verbin
On Fri, Sep 25, 2015 at 18:21:27 +0200, Thomas Schwinge wrote:
> On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin  wrote:
> > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > (if those are run from ctors, then I guess something like dl_load_lock
> > > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > > performed at the same time) in accessing/reallocating offload_images
> > > and num_offload_images and the lack of support to register further
> > > images after the gomp_target_init call (if you dlopen further shared
> > > libraries) is really bad.  And it would be really nice to support the
> > > unloading.
> 
> > Here is the latest patch for libgomp and mic plugin.
> 
> What about the scenario where one thread is inside
> GOMP_offload_register_ver/GOMP_offload_register (say, due to opening a
> shared library with such a mkoffload-generated constructor) and is
> modifying offload_images with register_lock held, and another thread is
> inside a GOMP_target* construct -> gomp_init_device and is accessing
> offload_images without register_lock held?  Or, why isn't that a
> reachable scenario?
> 
> Would the following patch (untested) do the right thing (locking added to
> gomp_init_device and gomp_unload_device)?  We can then also remove the
> is_register_lock parameter from gomp_load_image_to_device, and simplify
> the code.

Looks like you're right, and this scenario is possible.

  -- Ilya


Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-28 Thread Ilya Verbin
On Mon, Sep 28, 2015 at 12:09:19 +0200, Bernd Schmidt wrote:
> On 09/28/2015 12:03 PM, Bernd Schmidt wrote:
> >On 09/28/2015 10:26 AM, Thomas Schwinge wrote:
> >>-  objcopy_argv[8] = NULL;
> >>+  objcopy_argv[objcopy_argc++] = NULL;
> >>+  gcc_checking_assert (objcopy_argc <= OBJCOPY_ARGC_MAX);
> >
> >On its own this is not an improvement - you're trading a compile time
> >error for a runtime error. So, what is the other change this is
> >preparing for?
> 
> Ok, I now see the other patch. But I also see that other code in the same
> file and in the nvptx mkoffload is using the obstack_ptr_grow method to
> build argv arrays, I think that would be preferrable to this.

I've removed obstack_ptr_grow for arrays with known sizes after this review:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02210.html

  -- Ilya


[PATCH] liboffloadmic emulation mode: make it asynchronous

2015-09-28 Thread Ilya Verbin
Hi!

Currently the COI emulator is single-threaded, i.e. it is able to run only one
target function at a time, e.g. the following testcase:

  #pragma omp parallel sections num_threads(2)
{
  #pragma omp section
  #pragma omp target
  while (1)
putchar ('.');

  #pragma omp section
  #pragma omp target
  while (1)
putchar ('o');
}

prints only dots using emul, while using real libcoi it prints:
...o.o.o.o...o...o.oo.o.o.ooo.oo...o.o.o...o.ooo
Of course, it's not possible to test new OpenMP 4.1's async features using such
an emulator.

The patch bellow makes it asynchronous, it creates an auxiliary thread for each
COIPipeline in host and in target processes.  In general, a new COIPipeline is
created by liboffloadmic for each host thread with offload, i.e. the example
above has:
4 threads in the host process (2 OpenMP threads + 2 auxiliary threads) and
3 threads in the target process (1 main thread + 2 auxiliary threads).
An auxiliary host thread runs a target function in the new thread in target
process and waits for its completion.  When the function is finished, the host
thread signals an event and can run a callback, if it is registered.
liboffloadmic waits for signalled events by calling COIEventWait.
This is identical to how real libcoi works.

make check-target-libgomp and some internal tests did not show any regression.
TSan report is clean.  Is it OK for trunk?


liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (OFFLOAD_ACTIVE_WAIT_ENV): New
define.
(init): Set OFFLOAD_ACTIVE_WAIT env var to 0, if it is not set.
* runtime/emulator/coi_common.h (PIPE_HOST_PATH): Replace with ...
(PIPE_HOST2TGT_NAME): ... this.
(PIPE_TARGET_PATH): Replace with ...
(PIPE_TGT2HOST_NAME): ... this.
(MALLOCN): New define.
(READN): Likewise.
(WRITEN): Likewise.
(enum cmd_t): Replace CMD_RUN_FUNCTION with CMD_PIPELINE_RUN_FUNCTION.
Add CMD_PIPELINE_CREATE, CMD_PIPELINE_DESTROY.
* runtime/emulator/coi_device.cpp (engine_dir): New static variable.
(pipeline_thread_routine): New static function.
(COIProcessWaitForShutdown): Use global engine_dir instead of mic_dir.
Rename pipe_host and pipe_target to pipe_host2tgt and pipe_tgt2host.
If cmd is CMD_PIPELINE_CREATE, create a new thread for the pipeline.
Remove cmd == CMD_RUN_FUNCTION case.
* runtime/emulator/coi_device.h (COIERRORN): New define.
* runtime/emulator/coi_host.cpp: Include set, map, queue.
Replace typedefs with enums and structs.
(struct Function): Remove name, add num_buffers, bufs_size,
bufs_data_target, misc_data_len, misc_data, return_value_len,
return_value, completion_event.
(struct Callback): New.
(struct Process): Remove pipeline.  Add pipe_host2tgt and pipe_tgt2host.
(struct Pipeline): Remove pipe_host and pipe_target.  Add thread,
destroy, is_destroyed, pipe_host2tgt_path, pipe_tgt2host_path,
pipe_host2tgt, pipe_tgt2host, queue, process.
(max_pipeline_num): New static variable.
(pipelines): Likewise.
(max_event_num): Likewise.
(non_signalled_events): Likewise.
(errored_events): Likewise.
(callbacks): Likewise.
(cleanup): Do not check tmp_dirs before free.
(start_critical_section): New static function.
(finish_critical_section): Likewise.
(pipeline_is_destroyed): Likewise.
(maybe_invoke_callback): Likewise.
(signal_event): Likewise.
(get_event_result): Likewise.
(COIBufferCopy): Rename arguments according to headers.  Add asserts.
Use process' main pipes, instead of pipeline's pipes.  Signal completion
event.
(COIBufferCreate): Rename arguments according to headers.  Add asserts.
Use process' main pipes, instead of pipeline's pipes.
(COIBufferCreateFromMemory): Rename arguments according to headers.
Add asserts.
(COIBufferDestroy): Rename arguments according to headers.  Add asserts.
Use process' main pipes, instead of pipeline's pipes.
(COIBufferGetSinkAddress): Rename arguments according to headers.
Add asserts.
(COIBufferMap): Rename arguments according to headers.  Add asserts.
Signal completion event.
(COIBufferRead): Likewise.
(COIBufferSetState): Likewise.
(COIBufferUnmap): Likewise.
(COIBufferWrite): Likewise.
(COIEngineGetCount): Add assert.
(COIEngineGetHandle): Rename arguments according to headers.
Add assert.
(COIEventWait): Rename arguments according to headers.  Add asserts.
Implement waiting for events with zero or infinite timeout.
(COIEventRegisterCallback): New function.
(pipeline_thread_routine): New static function.
(COIPipelineCr

[PATCH][committed] Fix PR67652: wrong sizeof calculation in liboffloadmic

2015-09-28 Thread Ilya Verbin
Committed to trunk as obvious.

PR other/67652
* runtime/offload_engine.cpp (Engine::init_process): Fix sizeof.

diff --git a/liboffloadmic/runtime/offload_engine.cpp 
b/liboffloadmic/runtime/offload_engine.cpp
index 16b440d..00b673a 100644
--- a/liboffloadmic/runtime/offload_engine.cpp
+++ b/liboffloadmic/runtime/offload_engine.cpp
@@ -173,7 +173,7 @@ void Engine::init_process(void)
 // use putenv instead of setenv as Windows has no setenv.
 // Note: putenv requires its argument can't be freed or modified.
 // So no free after call to putenv or elsewhere.
-char * env_var = (char*) malloc(sizeof("COI_DMA_CHANNEL_COUNT=2" + 
1));
+char * env_var = (char*) malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
 sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
 putenv(env_var);  
 }

  -- Ilya


Re: [PATCH][committed] Fix PR67652: wrong sizeof calculation in liboffloadmic

2015-09-28 Thread Ilya Verbin
On Mon, Sep 28, 2015 at 18:15:14 +0200, Jakub Jelinek wrote:
> On Mon, Sep 28, 2015 at 07:10:13PM +0300, Ilya Verbin wrote:
> > Committed to trunk as obvious.
> > 
> > PR other/67652
> > * runtime/offload_engine.cpp (Engine::init_process): Fix sizeof.
> > 
> > diff --git a/liboffloadmic/runtime/offload_engine.cpp 
> > b/liboffloadmic/runtime/offload_engine.cpp
> > index 16b440d..00b673a 100644
> > --- a/liboffloadmic/runtime/offload_engine.cpp
> > +++ b/liboffloadmic/runtime/offload_engine.cpp
> > @@ -173,7 +173,7 @@ void Engine::init_process(void)
> >  // use putenv instead of setenv as Windows has no setenv.
> >  // Note: putenv requires its argument can't be freed or 
> > modified.
> >  // So no free after call to putenv or elsewhere.
> > -char * env_var = (char*) 
> > malloc(sizeof("COI_DMA_CHANNEL_COUNT=2" + 1));
> > +char * env_var = (char*) 
> > malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
> >  sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
> >  putenv(env_var);  
> 
> Missing error handling if malloc returns NULL?

Yes :(
I will grep all mallocs/reallocs one more time.

  -- Ilya


Re: [PATCH] liboffloadmic emulation mode: make it asynchronous

2015-09-29 Thread Ilya Verbin
On Tue, Sep 29, 2015 at 09:01:33 +0200, Jakub Jelinek wrote:
> On Mon, Sep 28, 2015 at 05:53:42PM +0300, Ilya Verbin wrote:
> > Currently the COI emulator is single-threaded, i.e. it is able to run only 
> > one
> > target function at a time, e.g. the following testcase:
> > 
> >   #pragma omp parallel sections num_threads(2)
> > {
> >   #pragma omp section
> >   #pragma omp target
> >   while (1)
> > putchar ('.');
> > 
> >   #pragma omp section
> >   #pragma omp target
> >   while (1)
> > putchar ('o');
> > }
> > 
> > prints only dots using emul, while using real libcoi it prints:
> > ...o.o.o.o...o...o.oo.o.o.ooo.oo...o.o.o...o.ooo
> > Of course, it's not possible to test new OpenMP 4.1's async features using 
> > such
> > an emulator.
> > 
> > The patch bellow makes it asynchronous, it creates an auxiliary thread for 
> > each
> > COIPipeline in host and in target processes.  In general, a new COIPipeline 
> > is
> > created by liboffloadmic for each host thread with offload, i.e. the example
> > above has:
> > 4 threads in the host process (2 OpenMP threads + 2 auxiliary threads) and
> > 3 threads in the target process (1 main thread + 2 auxiliary threads).
> > An auxiliary host thread runs a target function in the new thread in target
> > process and waits for its completion.  When the function is finished, the 
> > host
> > thread signals an event and can run a callback, if it is registered.
> > liboffloadmic waits for signalled events by calling COIEventWait.
> > This is identical to how real libcoi works.
> > 
> > make check-target-libgomp and some internal tests did not show any 
> > regression.
> > TSan report is clean.  Is it OK for trunk?
> 
> For now ok.  Though, I'd say I'd prefer if there were no auxiliary threads
> on the host side, just whatever thread is asked to send something to/from
> the device, wait for something and/or poll for something just polling the
>
> pipes.  Are there auxiliary host threads also for the case when using
> the real COI, offloading to hw?

Yes.

  -- Ilya


Re: [gomp4.1] Doacross tweaks

2015-09-30 Thread Ilya Verbin
Hi!

On Fri, Sep 25, 2015 at 18:54:47 +0200, Jakub Jelinek wrote:
> --- gcc/tree-pretty-print.c.jj2015-09-03 16:35:58.0 +0200
> +++ gcc/tree-pretty-print.c   2015-09-25 15:04:46.911844111 +0200
> @@ -569,7 +569,9 @@ dump_omp_clause (pretty_printer *pp, tre
>   if (TREE_PURPOSE (t) != integer_zero_node)
> {
>   tree p = TREE_PURPOSE (t);
> - if (!wi::neg_p (p, TYPE_SIGN (TREE_TYPE (p
> + if (OMP_CLAUSE_DEPEND_SINK_NEGATIVE (t))
> +   pp_minus (pp);
> + else
> pp_plus (pp);
>   dump_generic_node (pp, TREE_PURPOSE (t), spc, flags,
>  false);

This caused a warning:

gcc/tree-pretty-print.c: In function ‘void dump_omp_clause(pretty_printer*, 
tree, int, int)’:
gcc/tree-pretty-print.c:571:12: error: unused variable ‘p’ 
[-Werror=unused-variable]
   tree p = TREE_PURPOSE (t);
^

  -- Ilya


Re: [gomp4.1] depend nowait support for target {update,{enter,exit} data}

2015-10-02 Thread Ilya Verbin
Hi!

On Tue, Sep 08, 2015 at 11:20:14 +0200, Jakub Jelinek wrote:
> nowait support for #pragma omp target is not implemented yet, supposedly we
> need to mark those somehow (some flag) already in the struct gomp_task
> structure, essentially it will need either 2 or 3 callbacks
> (the current one, executed when the dependencies are resolved (it actually
> waits until some thread schedules it after that point, I think it is
> undesirable to run it with the tasking lock held), which would perform
> the gomp_map_vars and initiate the running of the region, and then some
> query routine which would poll the plugin whether the task is done or not,
> and either perform the finalization (unmap_vars) if it is done (and in any
> case return bool whether it should be polled again or not), and if the
> finalization is not done there, also another callback for the finalization.
> Also, there is the issue that if we are waiting for task that needs to be
> polled, and we don't have any further tasks to run, we shouldn't really
> attempt to sleep on some semaphore (e.g. in taskwait, end of
> taskgroup, etc.) or barrier, but rather either need to keep polling it, or
> call the query hook with some argument that it should sleep in there until
> the work is done by the offloading device.
> Also, there needs to be a way for the target nowait first callback to say
> that it is using host fallback and thus acts as a normal task, therefore
> once the task fn finishes, the task is done.

Here is my WIP patch.  target.c part is obviously incorrect, but it demonstrates
a possible libgomp <-> plugin interface for running a target task function
asynchronously and checking whether it is completed or not.
(Refactored liboffloadmic/runtime/emulator from trunk is required to run
target-tmp.c testcase.)


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index d798321..8e2b5aa 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -872,6 +872,8 @@ struct gomp_device_descr
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void *(*dev2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
+  void (*async_run_func) (int, void *, void *, const void *);
+  bool (*async_is_completed_func) (int, const void *);
 
   /* Splay tree containing information about mapped memory regions.  */
   struct splay_tree_s mem_map;
diff --git a/libgomp/target.c b/libgomp/target.c
index 77bd442..31f034c 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -45,6 +45,10 @@
 #include "plugin-suffix.h"
 #endif
 
+/* FIXME: TMP */
+#include 
+#include 
+
 static void gomp_target_init (void);
 
 /* The whole initialization code for offloading plugins is only run one.  */
@@ -1227,6 +1231,44 @@ gomp_target_fallback (void (*fn) (void *), void 
**hostaddrs)
   *thr = old_thr;
 }
 
+/* Host fallback with firstprivate map-type handling.  */
+
+static void
+gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
+  void **hostaddrs, size_t *sizes,
+  unsigned short *kinds)
+{
+  size_t i, tgt_align = 0, tgt_size = 0;
+  char *tgt = NULL;
+  for (i = 0; i < mapnum; i++)
+if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
+  {
+   size_t align = (size_t) 1 << (kinds[i] >> 8);
+   if (tgt_align < align)
+ tgt_align = align;
+   tgt_size = (tgt_size + align - 1) & ~(align - 1);
+   tgt_size += sizes[i];
+  }
+  if (tgt_align)
+{
+  tgt = gomp_alloca (tgt_size + tgt_align - 1);
+  uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
+  if (al)
+   tgt += tgt_align - al;
+  tgt_size = 0;
+  for (i = 0; i < mapnum; i++)
+   if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
+ {
+   size_t align = (size_t) 1 << (kinds[i] >> 8);
+   tgt_size = (tgt_size + align - 1) & ~(align - 1);
+   memcpy (tgt + tgt_size, hostaddrs[i], sizes[i]);
+   hostaddrs[i] = tgt + tgt_size;
+   tgt_size = tgt_size + sizes[i];
+ }
+}
+  gomp_target_fallback (fn, hostaddrs);
+}
+
 /* Helper function of GOMP_target{,_41} routines.  */
 
 static void *
@@ -1311,40 +1353,19 @@ GOMP_target_41 (int device, void (*fn) (void *), size_t 
mapnum,
   if (devicep == NULL
   || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
 {
-  size_t i, tgt_align = 0, tgt_size = 0;
-  char *tgt = NULL;
-  for (i = 0; i < mapnum; i++)
-   if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
- {
-   size_t align = (size_t) 1 << (kinds[i] >> 8);
-   if (tgt_align < align)
- tgt_align = align;
-   tgt_size = (tgt_size + align - 1) & ~(align - 1);
-   tgt_size += sizes[i];
- }
-  if (tgt_align)
-   {
- tgt = gomp_alloca (tgt_size + tgt_align - 1);
- uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
- if (al)
-   tgt += tgt_align - al;
- tgt_size = 0;
- 

Re: [PATCH][committed] Fix PR67652: wrong sizeof calculation in liboffloadmic

2015-10-08 Thread Ilya Verbin
On Mon, Sep 28, 2015 at 18:15:14 +0200, Jakub Jelinek wrote:
> > -char * env_var = (char*) 
> > malloc(sizeof("COI_DMA_CHANNEL_COUNT=2" + 1));
> > +char * env_var = (char*) 
> > malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
> >  sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
> >  putenv(env_var);  
> 
> Missing error handling if malloc returns NULL?

Fixed.

On Mon, Sep 28, 2015 at 09:19:30 -0700, Andrew Pinski wrote:
> Also why not just use strdup here? instead of malloc/sizeof/sprintf ?

Done.

Committed as obvious.


liboffloadmic/
* runtime/offload_engine.cpp (Engine::init_process): Use strdup instead
of sizeof+malloc+sprintf, check for return value.
* runtime/offload_env.cpp (MicEnvVar::get_env_var_kind): Check for
strdup return value.
* runtime/offload_host.cpp (__offload_init_library_once): Check for
strdup return value.  Fix size calculation of COI_HOST_THREAD_AFFINITY.
* runtime/emulator/coi_device.cpp (COIProcessWaitForShutdown): Check for
malloc return value.


diff --git a/liboffloadmic/runtime/offload_engine.cpp 
b/liboffloadmic/runtime/offload_engine.cpp
index 00b673a..4a88546 100644
--- a/liboffloadmic/runtime/offload_engine.cpp
+++ b/liboffloadmic/runtime/offload_engine.cpp
@@ -173,8 +173,9 @@ void Engine::init_process(void)
 // use putenv instead of setenv as Windows has no setenv.
 // Note: putenv requires its argument can't be freed or modified.
 // So no free after call to putenv or elsewhere.
-char * env_var = (char*) malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
-sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
+char * env_var = strdup("COI_DMA_CHANNEL_COUNT=2");
+   if (env_var == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 putenv(env_var);  
 }
 }
diff --git a/liboffloadmic/runtime/offload_env.cpp 
b/liboffloadmic/runtime/offload_env.cpp
index 79f5f36..ac33b67 100644
--- a/liboffloadmic/runtime/offload_env.cpp
+++ b/liboffloadmic/runtime/offload_env.cpp
@@ -212,10 +212,14 @@ MicEnvVarKind MicEnvVar::get_env_var_kind(
 *env_var_name_length = 3;
 *env_var_name = *env_var_def = c;
 *env_var_def = strdup(*env_var_def);
+   if (*env_var_def == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 return  c_mic_var;
 }
 *env_var_def = c + strlen("ENV=");
 *env_var_def = strdup(*env_var_def);
+   if (*env_var_def == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 return c_mic_card_env;
 }
 if (isalpha(*c)) {
@@ -229,6 +233,8 @@ MicEnvVarKind MicEnvVar::get_env_var_kind(
 return c_no_mic;
 }
 *env_var_def = strdup(*env_var_def);
+if (*env_var_def == NULL)
+  LIBOFFLOAD_ERROR(c_malloc);
 return card_is_set? c_mic_card_var : c_mic_var;
 }
 
diff --git a/liboffloadmic/runtime/offload_host.cpp 
b/liboffloadmic/runtime/offload_host.cpp
index 08f626f..eec457d 100644
--- a/liboffloadmic/runtime/offload_host.cpp
+++ b/liboffloadmic/runtime/offload_host.cpp
@@ -5173,6 +5173,8 @@ static void __offload_init_library_once(void)
 if (strcasecmp(env_var, "none") != 0) {
 // value is composed of comma separated physical device indexes
 char *buf = strdup(env_var);
+   if (buf == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 char *str, *ptr;
 for (str = strtok_r(buf, ",", &ptr); str != 0;
  str = strtok_r(0, ",", &ptr)) {
@@ -5245,7 +5247,9 @@ static void __offload_init_library_once(void)
 if (env_var != 0) {
 char * new_env_var =
(char*) malloc(sizeof("COI_HOST_THREAD_AFFINITY=") +
-  sizeof(env_var) + 1);
+  strlen(env_var));
+   if (new_env_var == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 sprintf(new_env_var, "COI_HOST_THREAD_AFFINITY=%s", env_var);
 putenv(new_env_var);
 }
@@ -5254,6 +5258,8 @@ static void __offload_init_library_once(void)
 env_var = getenv("MIC_LD_LIBRARY_PATH");
 if (env_var != 0) {
 mic_library_path = strdup(env_var);
+   if (mic_library_path == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 }
 
 
@@ -5262,6 +5268,8 @@ static void __offload_init_library_once(void)
 const char *base_name = "offload_main";
 if (mic_library_path != 0) {
 char *buf = strdup(mic_library_path);
+   if (buf == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 char *try_name = (char*) alloca(strlen(mic_library_path) +
 strlen(base_name) + 2);
 char *dir, *ptr;
@@ -5275,6 +5283,8 @@ static void __offload_init_library_once(void)
 struct stat st;
 if (stat(try_name, &st) == 0 && S_ISREG(st.st_mode)) {
 mic_device_main = strdup(try_name);
+   if (mic_device_main == NUL

Re: libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock

2015-10-09 Thread Ilya Verbin
On Fri, Oct 09, 2015 at 13:58:32 +0200, Bernd Schmidt wrote:
> One oddity I noticed in target.c is that there are two different num_devices
> variables:
> 
>   /* Total number of available devices.  */
>   static int num_devices;
> 
>   /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
>   static int num_devices_openmp;
> 
> Confusingly, the get_num_devices function returns num_devices_openmp. That
> function includes a pthread_once call to gomp_target_init, which sets up
> these variables. References to num_devices_openmp through get_num_devices
> are thereforce guaranteed to be initialized. However, there are direct
> references to num_devices, in GOMP_offload_register_ver and
> GOMP_offload_unregister_ver, and they don't seem to enforce any kind of
> initialization:
> 
>   /* Load image to all initialized devices.  */
>   for (i = 0; i < num_devices; i++)
> {
>   struct gomp_device_descr *devicep = &devices[i];
>   gomp_mutex_lock (&devicep->lock);
>   if (devicep->type == target_type && devicep->is_initialized)
> gomp_load_image_to_device (devicep, version,
>host_table, target_data, true);
>   gomp_mutex_unlock (&devicep->lock);
> }
> 
> I'm guessing this only triggers when dlopening something with an offload
> image after devices have been initialized already, and it looks like we have
> symmetrical code in gomp_init_device.

Right, this code offloads given image to all initialized devices, and similar
code in gomp_init_device offloads all registered images to a given device.

> Wouldn't it be possible/better to
> force a gomp_target_init before referencing num_devices, and then relying on
> the code I quoted and deleting the image loading from gomp_init_device?

gomp_target_init only loads plugins and sets num_devices/num_devices_openmp, but
it doesn't call gomp_init_device, because we wanted to defer device
initialization as much as possible.  So, gomp_init_device is called immediately
before usage of that device.

  -- Ilya


Re: [gomp4.1] OpenMP 4.1 is dead, long live OpenMP 4.5

2015-10-09 Thread Ilya Verbin
On Fri, Oct 09, 2015 at 09:55:07 +0200, Jakub Jelinek wrote:
> -GOMP_4.1 {
> +GOMP_4.5 {
>global:
>   GOMP_target_41;
>   GOMP_target_data_41;

Should we rename it to GOMP_target*_45, or do you know some more mnemonic name?

  -- Ilya


Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data

2015-10-13 Thread Ilya Verbin
On Mon, Jun 15, 2015 at 22:48:50 +0300, Ilya Verbin wrote:
> @@ -950,50 +997,41 @@ GOMP_target (int device, void (*fn) (void *), const 
> void *unused,
> ...
> +  devicep->run_func (devicep->target_id, fn_addr, (void *) 
> tgt_vars->tgt_start);

If mapnum is 0, tgt_vars->tgt_start is uninitialized.  This is not a big bug,
because in this case the target function doesn't use this pointer, however
valgrind warns about sending uninitialized data to target.
OK for gomp-4_1-branch?


libgomp/
* target.c (gomp_map_vars): Zero tgt->tgt_start when mapnum is 0.


diff --git a/libgomp/target.c b/libgomp/target.c
index 95360d1..c4e3323 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -323,6 +323,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t 
mapnum,
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
 = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
+  tgt->tgt_start = 0;
   tgt->list_count = mapnum;
   tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
   tgt->device_descr = devicep;


  -- Ilya


Re: [gomp4.1] depend nowait support for target {update,{enter,exit} data}

2015-10-15 Thread Ilya Verbin
On Thu, Oct 15, 2015 at 16:01:56 +0200, Jakub Jelinek wrote:
> On Fri, Oct 02, 2015 at 10:28:01PM +0300, Ilya Verbin wrote:
> > Here is my WIP patch.  target.c part is obviously incorrect, but it 
> > demonstrates
> > a possible libgomp <-> plugin interface for running a target task function
> > asynchronously and checking whether it is completed or not.
> > (Refactored liboffloadmic/runtime/emulator from trunk is required to run
> > target-tmp.c testcase.)
> 
> > diff --git a/libgomp/target.c b/libgomp/target.c
> > index 77bd442..31f034c 100644
> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -45,6 +45,10 @@
> >  #include "plugin-suffix.h"
> >  #endif
> >  
> > +/* FIXME: TMP */
> > +#include 
> > +#include 
> 
> I hope you mean to remove this later on.

Sure, this is just a prototype, not for committing.


> > @@ -1227,6 +1231,44 @@ gomp_target_fallback (void (*fn) (void *), void 
> > **hostaddrs)
> >*thr = old_thr;
> >  }
> >  
> > +/* Host fallback with firstprivate map-type handling.  */
> > +
> > +static void
> > +gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
> > +  void **hostaddrs, size_t *sizes,
> > +  unsigned short *kinds)
> > +{
> > +  size_t i, tgt_align = 0, tgt_size = 0;
> > +  char *tgt = NULL;
> > +  for (i = 0; i < mapnum; i++)
> > +if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
> > +  {
> > +   size_t align = (size_t) 1 << (kinds[i] >> 8);
> > +   if (tgt_align < align)
> > + tgt_align = align;
> > +   tgt_size = (tgt_size + align - 1) & ~(align - 1);
> > +   tgt_size += sizes[i];
> > +  }
> > +  if (tgt_align)
> > +{
> > +  tgt = gomp_alloca (tgt_size + tgt_align - 1);
> > +  uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
> > +  if (al)
> > +   tgt += tgt_align - al;
> > +  tgt_size = 0;
> > +  for (i = 0; i < mapnum; i++)
> > +   if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
> > + {
> > +   size_t align = (size_t) 1 << (kinds[i] >> 8);
> > +   tgt_size = (tgt_size + align - 1) & ~(align - 1);
> > +   memcpy (tgt + tgt_size, hostaddrs[i], sizes[i]);
> > +   hostaddrs[i] = tgt + tgt_size;
> > +   tgt_size = tgt_size + sizes[i];
> > + }
> > +}
> > +  gomp_target_fallback (fn, hostaddrs);
> > +}
> 
> This is ok.
> 
> >  /* Helper function of GOMP_target{,_41} routines.  */
> >  
> >  static void *
> > @@ -1311,40 +1353,19 @@ GOMP_target_41 (int device, void (*fn) (void *), 
> > size_t mapnum,
> >if (devicep == NULL
> >|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
> >  {
> > -  size_t i, tgt_align = 0, tgt_size = 0;
> > -  char *tgt = NULL;
> > -  for (i = 0; i < mapnum; i++)
> > -   if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
> > - {
> > -   size_t align = (size_t) 1 << (kinds[i] >> 8);
> > -   if (tgt_align < align)
> > - tgt_align = align;
> > -   tgt_size = (tgt_size + align - 1) & ~(align - 1);
> > -   tgt_size += sizes[i];
> > - }
> > -  if (tgt_align)
> > -   {
> > - tgt = gomp_alloca (tgt_size + tgt_align - 1);
> > - uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
> > - if (al)
> > -   tgt += tgt_align - al;
> > - tgt_size = 0;
> > - for (i = 0; i < mapnum; i++)
> > -   if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
> > - {
> > -   size_t align = (size_t) 1 << (kinds[i] >> 8);
> > -   tgt_size = (tgt_size + align - 1) & ~(align - 1);
> > -   memcpy (tgt + tgt_size, hostaddrs[i], sizes[i]);
> > -   hostaddrs[i] = tgt + tgt_size;
> > -   tgt_size = tgt_size + sizes[i];
> > - }
> > -   }
> > -  gomp_target_fallback (fn, hostaddrs);
> > +  gomp_target_fallback_firstprivate (fn, mapnum, hostaddrs, sizes, 
> > kinds);
> >return;
> >  }
> 
> This too.

I will commit this small part to gomp-4_5-branch separately.


> > diff --git a/libgomp/testsuite/libgomp.c/target-tmp.c 
> > b/libgomp/testsuite/libgomp.c/target-tmp.c
> > new file mode 100644
> > index 000..23a739c
> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.c/target-tmp.c
> > @@ -0,0 

Re: OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)

2015-10-19 Thread Ilya Verbin
On Mon, Oct 19, 2015 at 18:24:35 +0200, Thomas Schwinge wrote:
> Chung-Lin, would you please have a look at the following (on
> gomp-4_0-branch)?  Also, anyone else got any ideas off-hand?
> 
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

Maybe it was caused by this change in gomp_unmap_vars?
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01376.html

Looking at the code, I don't see any difference in async_refcount handling, but
I was unable to test it without having hardware :(

  -- Ilya


Re: [gomp4.1] depend nowait support for target {update,{enter,exit} data}

2015-10-19 Thread Ilya Verbin
On Thu, Oct 15, 2015 at 16:01:56 +0200, Jakub Jelinek wrote:
> >void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> >  
> > +  if (flags & GOMP_TARGET_FLAG_NOWAIT)
> > +{
> > +  gomp_create_target_task (devicep, fn_addr, mapnum, hostaddrs, sizes,
> > +  kinds, flags, depend);
> > +  return;
> > +}
> 
> But this is not ok.  You need to do this far earlier, already before the
> if (depend != NULL) code in GOMP_target_41.  And, I think you should just
> not pass fn_addr, but fn itself.
> 
> > @@ -1636,34 +1657,58 @@ void
> >  gomp_target_task_fn (void *data)
> >  {
> >struct gomp_target_task *ttask = (struct gomp_target_task *) data;
> > +  struct gomp_device_descr *devicep = ttask->devicep;
> > +
> >if (ttask->fn != NULL)
> >  {
> > -  /* GOMP_target_41 */
> > +  if (devicep == NULL
> > + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
> > +   {
> > + /* FIXME: Save host fn addr into gomp_target_task?  */
> > + gomp_target_fallback_firstprivate (NULL, ttask->mapnum,
> 
> If you pass above fn instead of fn_addr, ttask->fn is what you want
> to pass to gomp_target_fallback_firstprivate here and remove the FIXME.
> 
> > +ttask->hostaddrs, ttask->sizes,
> > +ttask->kinds);
> > + return;
> > +   }
> > +
> > +  struct target_mem_desc *tgt_vars
> > +   = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL,
> > +ttask->sizes, ttask->kinds, true,
> > +GOMP_MAP_VARS_TARGET);
> > +  devicep->async_run_func (devicep->target_id, ttask->fn,
> > +  (void *) tgt_vars->tgt_start, data);
> 
> You need to void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn);
> first obviously, and pass fn_addr.
> 
> > +
> > +  /* FIXME: TMP example of checking for completion.
> > +Alternatively the plugin can set some completion flag in ttask.  */
> > +  while (!devicep->async_is_completed_func (devicep->target_id, data))
> > +   {
> > + fprintf (stderr, "-");
> > + usleep (10);
> > +   }
> 
> This obviously doesn't belong here.
> 
> >if (device->capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
> > diff --git a/libgomp/testsuite/libgomp.c/target-tmp.c 
> > b/libgomp/testsuite/libgomp.c/target-tmp.c
> > new file mode 100644
> > index 000..23a739c
> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.c/target-tmp.c
> > @@ -0,0 +1,40 @@
> > +#include 
> > +#include 
> > +
> > +#pragma omp declare target
> > +void foo (int n)
> > +{
> > +  printf ("Start tgt %d\n", n);
> > +  usleep (500);
> 
> 5s is too long.  Not to mention that not sure if PTX can do printf
> and especially usleep.
> 
> > diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
> > b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > index 26ac6fe..c843710 100644
> > --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> ...
> > +/* Set of asynchronously running target tasks.  */
> > +static std::set *async_tasks;
> > +
> >  /* Thread-safe registration of the main image.  */
> >  static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
> >  
> > +/* Mutex for protecting async_tasks.  */
> > +static pthread_mutex_t async_tasks_lock = PTHREAD_MUTEX_INITIALIZER;
> > +
> >  static VarDesc vd_host2tgt = {
> >{ 1, 1 },  /* dst, src */
> >{ 1, 0 },  /* in, out  */
> > @@ -156,6 +163,8 @@ init (void)
> >  
> >  out:
> >address_table = new ImgDevAddrMap;
> > +  async_tasks = new std::set;
> > +  pthread_mutex_init (&async_tasks_lock, NULL);
> 
> PTHREAD_MUTEX_INITIALIZER should already initialize the lock.
> But, do you really need async_tasks and the lock?  Better store
> something into some plugin's owned field in target_task struct and
> let the plugin callback be passed address of that field rather than the
> whole target_task?

So, here is what I have for now.  Attached target-29.c testcase works fine with
MIC emul, however I don't know how to (and where) properly check for completion
of async execution on target.  And, similarly, where to do unmapping after that?
Do we need a callback from plugin to libgomp (as far as I understood, PTX
runtime supports this, but HSA doesn't), or libgomp will just check for
ttask->is_completed in task.c?

 
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 9c8b1fb..e707c80 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -430,6 +430,7 @@ struct gomp_target_task
   size_t *sizes;
   unsigned short *kinds;
   unsigned int flags;
+  bool is_completed;
   void *hostaddrs[];
 };
 
@@ -877,6 +878,7 @@ struct gomp_device_descr
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void *(*dev2dev_func) (int, void *, const void *, size_t);
   void (*run

Re: [gomp4] lto error message

2015-10-20 Thread Ilya Verbin
On Tue, Oct 20, 2015 at 15:54:45 -0400, Nathan Sidwell wrote:
> @@ -1209,16 +1209,11 @@ input_overwrite_node (struct lto_file_de
>  
>if (!success)
>  {
> -  if (flag_openacc)
> - {
> -   if (TREE_CODE (node->decl) == FUNCTION_DECL)
> - error ("Missing routine function %<%s%>", node->name ());
> -   else
> - error ("Missing declared variable %<%s%>", node->name ());
> - }
> -
> +  gcc_assert (flag_openacc);
> +  if (TREE_CODE (node->decl) == FUNCTION_DECL)
> + error ("missing OpenACC % function %qD", node->decl);
>else
> - gcc_unreachable ();
> + error ("missing OpenACC % variable %qD", node->decl);
>  }
>  }

There might be a situation when some func or var is lost during regular LTO,
even if flag_openacc is present.  In this case "missing OpenACC ..." message
would be wrong.  And if flag_openacc is absent, gcc_assert (flag_openacc) is a
bit confusing.  We disscussed this with Cesar here:
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02076.html

  -- Ilya


Re: Constify host-side offload data`

2015-10-21 Thread Ilya Verbin
Hi!

On Wed, Jul 15, 2015 at 20:56:50 -0400, Nathan Sidwell wrote:
> --- libgcc/offloadstuff.c (revision 225851)
> +++ libgcc/offloadstuff.c (working copy)
> ...
> -void *__offload_func_table[0]
> +const void *const __offload_func_table[0]
> ...
> -void *__offload_var_table[0]
> +const void *const __offload_var_table[0]

I've just noticed that this patch + similar change in intelmic-mkoffload.c
 bumps up the filesize
of "helloworld" with offloading to MIC from 17KB to 4MB!

This happens because .gnu.offload_{funcs,vars} sections in
crtoffload{begin,end}.o now doesn't have WRITE flag, but the same sections
produced by omp_finish_file has it.  When linker joins writable + nonwritable
sections from several objects, it inserts some weird 2MB offset into the final
binary.  I.e. now there are 2 such offsets: one in the host binary and one in
the MIC target image, hence 4MB.  I haven't investigated how it happens, because
I thing it's bad idea to join sections with different flags.

But we can't make .gnu.offload_{funcs,vars} in omp_finish_file also readonly,
because in case of shared libraries there are R_X86_64_RELATIVE relocations,
which make these sections writable.  So, I guess we need to remove all consts to
make these sections writable in all objects.

H.J.,
Maybe linker should print some warning about joining writable + nonwritable
sections?  Here is a simple testcase:

$ cat t1.s
.section ".AAA", "a"
.long 0x12345678
$ cat t2.s
.section ".AAA", "wa"
.long 0x12345678
$ as t1.s -o t1.o
$ as t2.s -o t2.o
$ ld -shared t1.o t2.o
$ ls -lh a.out
2.1M a.out

  -- Ilya


Re: Constify host-side offload data`

2015-10-21 Thread Ilya Verbin
On Wed, Oct 21, 2015 at 10:38:10 -0700, H.J. Lu wrote:
> On Wed, Oct 21, 2015 at 10:33 AM, Ilya Verbin  wrote:
> > H.J.,
> > Maybe linker should print some warning about joining writable + nonwritable
> > sections?  Here is a simple testcase:
> >
> > $ cat t1.s
> > .section ".AAA", "a"
> > .long 0x12345678
> > $ cat t2.s
> > .section ".AAA", "wa"
> > .long 0x12345678
> > $ as t1.s -o t1.o
> > $ as t2.s -o t2.o
> > $ ld -shared t1.o t2.o
> > $ ls -lh a.out
> > 2.1M a.out
> >
> 
> Does linker make AAA  writable? If yes, linker does what it
> is told.

Yes, it makes it writable, but why it also makes this?

  [Nr] Name  Type Address   Offset
   Size  EntSize  Flags  Link  Info  Align
  [ 0]   NULL   
        0 0 0
  [ 1] .hash HASH 00b0  00b0
   0028  0004   A   2 0 8
  [ 2] .dynsym   DYNSYM   00d8  00d8
   0078  0018   A   3 2 8
  [ 3] .dynstr   STRTAB   0150  0150
   0019     A   0 0 1
  [ 4] .AAA  PROGBITS 0169  0169
   0008    WA   0 0 1
  [ 5] .eh_frame PROGBITS 0178  0178
        A   0 0 8
  [ 6] .dynamic  DYNAMIC  00200178  00200178  <-- ???
   00b0  0010  WA   3 0 8
  [ 7] .shstrtab STRTAB     00200380
   0049     0 0 1
  [ 8] .symtab   SYMTAB     00200228
   0120  0018   9 9 8
  [ 9] .strtab   STRTAB     00200348
   0038     0 0 1

  -- Ilya


Re: [OpenACC 11/11] execution tests

2015-10-21 Thread Ilya Verbin


> On 21 Oct 2015, at 22:53, Nathan Sidwell  wrote:
> 
> This patch has some new execution tests, verifying loop partitioning is 
> behaving as expected.
> 
> There are more execution tests on the gomp4 branch, but many of them use 
> reductions.  We'll merge those once reductions are merged.
> 
> nathan
> <11-trunk-tests.patch>

Does the testcase with offload IR appear here accidentally?

  -- Ilya

Re: Constify host-side offload data`

2015-10-22 Thread Ilya Verbin
On Wed, Oct 21, 2015 at 10:44:56 -0700, H.J. Lu wrote:
> On Wed, Oct 21, 2015 at 10:42 AM, Ilya Verbin  wrote:
> > On Wed, Oct 21, 2015 at 10:38:10 -0700, H.J. Lu wrote:
> >> On Wed, Oct 21, 2015 at 10:33 AM, Ilya Verbin  wrote:
> >> > H.J.,
> >> > Maybe linker should print some warning about joining writable + 
> >> > nonwritable
> >> > sections?  Here is a simple testcase:
> >> >
> >> > $ cat t1.s
> >> > .section ".AAA", "a"
> >> > .long 0x12345678
> >> > $ cat t2.s
> >> > .section ".AAA", "wa"
> >> > .long 0x12345678
> >> > $ as t1.s -o t1.o
> >> > $ as t2.s -o t2.o
> >> > $ ld -shared t1.o t2.o
> >> > $ ls -lh a.out
> >> > 2.1M a.out
> >> >
> >>
> >> Does linker make AAA  writable? If yes, linker does what it
> >> is told.
> >
> > Yes, it makes it writable, but why it also makes this?
> >
> >   [Nr] Name  Type Address   Offset
> >Size  EntSize  Flags  Link  Info  Align
> >   [ 0]   NULL   
> >     0 0 0
> >   [ 1] .hash HASH 00b0  00b0
> >0028  0004   A   2 0 8
> >   [ 2] .dynsym   DYNSYM   00d8  00d8
> >0078  0018   A   3 2 8
> >   [ 3] .dynstr   STRTAB   0150  0150
> >0019     A   0 0 1
> >   [ 4] .AAA  PROGBITS 0169  0169
> >0008    WA   0 0 1
> >   [ 5] .eh_frame PROGBITS 0178  0178
> >     A   0 0 8
> >   [ 6] .dynamic  DYNAMIC  00200178  00200178  <-- 
> > ???
> >00b0  0010  WA   3 0 8
> >   [ 7] .shstrtab STRTAB     00200380
> >0049     0 0 1
> >   [ 8] .symtab   SYMTAB     00200228
> >0120  0018   9 9 8
> >   [ 9] .strtab   STRTAB     00200348
> >0038     0 0 1
> >
> 
> Linker groups input sections by section name and ors section
> flags.

Could you please help figure out how this number 0x200178 is calculated?
ld -verbose doesn't show anything helpful.  It seems that something goes wrong
during section-to-segment mapping, because when both .AAA have "wa" flags, we
got small binary with 2 LOAD segments:
  Type   Offset VirtAddr   PhysAddr
 FileSizMemSiz  Flags  Align
  LOAD   0x 0x 0x
 0x01a8 0x01a8  R  20
  LOAD   0x01a8 0x002001a8 0x002001a8
 0x00b8 0x00b8  RW 20

But when one .AAA has "a" flag, and another .AAA has "wa" flag, we got huge
binary with only one big LOAD segment:
  Type   Offset VirtAddr   PhysAddr
 FileSizMemSiz  Flags  Align
  LOAD   0x 0x 0x
 0x00200228 0x00200228  RW 20

BTW, gold produces small binary in both cases.

Thanks,
  -- Ilya


Re: Constify host-side offload data`

2015-10-22 Thread Ilya Verbin
On Thu, Oct 22, 2015 at 07:35:55 -0700, H.J. Lu wrote:
> On Thu, Oct 22, 2015 at 7:11 AM, Ilya Verbin  wrote:
> > On Wed, Oct 21, 2015 at 10:44:56 -0700, H.J. Lu wrote:
> >> On Wed, Oct 21, 2015 at 10:42 AM, Ilya Verbin  wrote:
> >> > On Wed, Oct 21, 2015 at 10:38:10 -0700, H.J. Lu wrote:
> >> >> On Wed, Oct 21, 2015 at 10:33 AM, Ilya Verbin  wrote:
> >> >> > H.J.,
> >> >> > Maybe linker should print some warning about joining writable + 
> >> >> > nonwritable
> >> >> > sections?  Here is a simple testcase:
> >> >> >
> >> >> > $ cat t1.s
> >> >> > .section ".AAA", "a"
> >> >> > .long 0x12345678
> >> >> > $ cat t2.s
> >> >> > .section ".AAA", "wa"
> >> >> > .long 0x12345678
> >> >> > $ as t1.s -o t1.o
> >> >> > $ as t2.s -o t2.o
> >> >> > $ ld -shared t1.o t2.o
> >> >> > $ ls -lh a.out
> >> >> > 2.1M a.out
> >> >> >
> >> >>
> >> >> Does linker make AAA  writable? If yes, linker does what it
> >> >> is told.
> >> >
> >> > Yes, it makes it writable, but why it also makes this?
> >> >
> >> >   [Nr] Name  Type Address   Offset
> >> >Size  EntSize  Flags  Link  Info  Align
> >> >   [ 0]   NULL   
> >> >     0 0 0
> >> >   [ 1] .hash HASH 00b0  00b0
> >> >0028  0004   A   2 0 8
> >> >   [ 2] .dynsym   DYNSYM   00d8  00d8
> >> >0078  0018   A   3 2 8
> >> >   [ 3] .dynstr   STRTAB   0150  0150
> >> >0019     A   0 0 1
> >> >   [ 4] .AAA  PROGBITS 0169  0169
> >> >0008    WA   0 0 1
> >> >   [ 5] .eh_frame PROGBITS 0178  0178
> >> >     A   0 0 8
> >> >   [ 6] .dynamic  DYNAMIC  00200178  00200178  
> >> > <-- ???
> >> >00b0  0010  WA   3 0 8
> >> >   [ 7] .shstrtab STRTAB     00200380
> >> >0049     0 0 1
> >> >   [ 8] .symtab   SYMTAB     00200228
> >> >0120  0018   9 9 8
> >> >   [ 9] .strtab   STRTAB     00200348
> >> >0038     0 0 1
> >> >
> >>
> >> Linker groups input sections by section name and ors section
> >> flags.
> >
> > Could you please help figure out how this number 0x200178 is calculated?
> > ld -verbose doesn't show anything helpful.  It seems that something goes 
> > wrong
> > during section-to-segment mapping, because when both .AAA have "wa" flags, 
> > we
> > got small binary with 2 LOAD segments:
> >   Type   Offset VirtAddr   PhysAddr
> >  FileSizMemSiz  Flags  Align
> >   LOAD   0x 0x 0x
> >  0x01a8 0x01a8  R  20
> >   LOAD   0x01a8 0x002001a8 0x002001a8
> >  0x00b8 0x00b8  RW 20
> >
> > But when one .AAA has "a" flag, and another .AAA has "wa" flag, we got huge
> > binary with only one big LOAD segment:
> >   Type   Offset VirtAddr   PhysAddr
> >  FileSizMemSiz  Flags  Align
> >   LOAD   0x 0x 0x
> >  0x00200228 0x00200228  RW 20
> >
> > BTW, gold produces small binary in both cases.
> >
> 
> Please open a binutils bug with a testcase.

Done: https://sourceware.org/bugzilla/show_bug.cgi?id=19162

  -- Ilya


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2015-10-22 Thread Ilya Verbin
On Mon, Dec 22, 2014 at 13:01:40 +0100, Thomas Schwinge wrote:
> By chance (when tracking down a different problem), I've found the
> following.  Would you please check whether that's a real problem in
> liboffloadmic, or its libgomp plugin, or just a mis-diagnosis by
> Valgrind?
> 
> ==21327== Syscall param write(buf) points to uninitialised byte(s)

Finally we have investigated this :)  Valgrind warns about uninitialized bytes,
inserted into the struct for alignment.  It's possible to avoid the warning by
the patch bellow.  Should I commit it, or just leave it as is?


diff --git a/liboffloadmic/runtime/offload_host.cpp 
b/liboffloadmic/runtime/offload_host.cpp
index d04233f..66c2a01 100644
--- a/liboffloadmic/runtime/offload_host.cpp
+++ b/liboffloadmic/runtime/offload_host.cpp
@@ -2425,6 +2425,7 @@ bool OffloadDescriptor::setup_misc_data(const char *name)
misc_data_size);
 if (m_func_desc == NULL)
   LIBOFFLOAD_ERROR(c_malloc);
+   memset (m_func_desc, 0, m_func_desc_size + misc_data_size);
 m_func_desc->console_enabled = console_enabled;
 m_func_desc->timer_enabled = offload_report_enabled &&
 (timer_enabled || offload_report_level);


  -- Ilya


Re: [gomp4.1] map clause parsing improvements

2015-10-26 Thread Ilya Verbin
On Tue, Oct 20, 2015 at 12:03:40 +0200, Jakub Jelinek wrote:
> On Mon, Oct 19, 2015 at 05:00:33PM +0200, Thomas Schwinge wrote:
> >   n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
> >   if ((ctx->region_type & ORT_TARGET) != 0
> >   && !(n->value & GOVD_SEEN)
> >   && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0
> >   || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT))
> > {
> >   remove = true;
> > 
> > I'd suggest turning GOMP_MAP_FLAG_ALWAYS into GOMP_MAP_FLAG_SPECIAL_2,
> > and then provide a GOMP_MAP_ALWAYS_P that evaluates to true just for the
> > three "always,to", "always,from", and "always,tofrom" cases.
> 
> Yeah, that can be done, I'll add it to my todo list.

Is this what you planned?  I've replaced all 3 uses of GOMP_MAP_FLAG_ALWAYS with
GOMP_MAP_ALWAYS_P.  make check and check-target-libgomp passed, however these 2
changes in gimplify_scan_omp_clauses are not covered by the testsuite, so I'm
not entirely sure that they are correct.  OK for gomp-4_5-branch?


gcc/
* gimplify.c (gimplify_scan_omp_clauses): Use GOMP_MAP_ALWAYS_P.
(gimplify_adjust_omp_clauses): Likewise.
include/
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_2): Define.
(GOMP_MAP_FLAG_ALWAYS): Remove.
(enum gomp_map_kind): Use GOMP_MAP_FLAG_SPECIAL_2 instead of
GOMP_MAP_FLAG_ALWAYS for GOMP_MAP_ALWAYS_TO, GOMP_MAP_ALWAYS_FROM,
GOMP_MAP_ALWAYS_TOFROM, GOMP_MAP_STRUCT, GOMP_MAP_RELEASE.
(GOMP_MAP_ALWAYS_P): Define.


diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ee5cb95..57ab6c6 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6613,7 +6613,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  struct_map_to_clause->put (decl, *list_p);
  list_p = &OMP_CLAUSE_CHAIN (*list_p);
  flags = GOVD_MAP | GOVD_EXPLICIT;
- if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
+ if (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)))
flags |= GOVD_SEEN;
  goto do_add_decl;
}
@@ -6623,7 +6623,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  tree *sc = NULL, *pt = NULL;
  if (!ptr && TREE_CODE (*osc) == TREE_LIST)
osc = &TREE_PURPOSE (*osc);
- if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
+ if (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)))
n->value |= GOVD_SEEN;
  offset_int o1, o2;
  if (offset)
@@ -7363,7 +7363,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree 
*list_p,
  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
  if ((ctx->region_type & ORT_TARGET) != 0
  && !(n->value & GOVD_SEEN)
- && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0
+ && (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0
  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT))
{
  remove = true;
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index f834dec..2c6f011 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -39,10 +39,9 @@
 /* Special map kinds, enumerated starting here.  */
 #define GOMP_MAP_FLAG_SPECIAL_0(1 << 2)
 #define GOMP_MAP_FLAG_SPECIAL_1(1 << 3)
+#define GOMP_MAP_FLAG_SPECIAL_2(1 << 4)
 #define GOMP_MAP_FLAG_SPECIAL  (GOMP_MAP_FLAG_SPECIAL_1 \
 | GOMP_MAP_FLAG_SPECIAL_0)
-/* OpenMP always flag.  */
-#define GOMP_MAP_FLAG_ALWAYS   (1 << 6)
 /* Flag to force a specific behavior (or else, trigger a run-time error).  */
 #define GOMP_MAP_FLAG_FORCE(1 << 7)
 
@@ -95,29 +94,31 @@ enum gomp_map_kind
 GOMP_MAP_FORCE_TOFROM =(GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM),
 /* If not already present, allocate.  And unconditionally copy to
device.  */
-GOMP_MAP_ALWAYS_TO =   (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_TO),
+GOMP_MAP_ALWAYS_TO =   (GOMP_MAP_FLAG_SPECIAL_2 | GOMP_MAP_TO),
 /* If not already present, allocate.  And unconditionally copy from
device.  */
-GOMP_MAP_ALWAYS_FROM = (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_FROM),
+GOMP_MAP_ALWAYS_FROM = (GOMP_MAP_FLAG_SPECIAL_2
+| GOMP_MAP_FROM),
 /* If not already present, allocate.  And unconditionally copy to and from
device.  */
-GOMP_MAP_ALWAYS_TOFROM =   (GOMP_MAP_FLAG_ALWAYS | 
GOMP_MAP_TOFROM),
+GOMP_MAP_ALWAYS_TOFROM =   (GOMP_MAP_FLAG_SPECIAL_2
+| GOMP_MAP_TOFROM),
 /* Map a sparse struct

Re: [gomp4.1] map clause parsing improvements

2015-10-26 Thread Ilya Verbin
On Mon, Oct 26, 2015 at 14:07:13 +0100, Jakub Jelinek wrote:
> On Mon, Oct 26, 2015 at 03:53:57PM +0300, Ilya Verbin wrote:
> > @@ -7363,7 +7363,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree 
> > *list_p,
> >   n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
> >   if ((ctx->region_type & ORT_TARGET) != 0
> >   && !(n->value & GOVD_SEEN)
> > - && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0
> > + && (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0
> >   || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT))
> 
> The || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT part can go then too,
> it was there only because (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
> has been non-zero for GOMP_MAP_STRUCT (and the () pair around the condition
> too).

Oops, missed that.

> We want to be able to remove all map clauses on the target construct, except
> if it is always {to,from,tofrom}.
> We do not want to remove release or delete, but those only exist on target
> exit data and thus are handled by (ctx->region_type & ORT_TARGET) != 0.
> 
> > @@ -142,6 +143,10 @@ enum gomp_map_kind
> >  #define GOMP_MAP_ALWAYS_FROM_P(X) \
> >(((X) == GOMP_MAP_ALWAYS_FROM) || ((X) == GOMP_MAP_ALWAYS_TOFROM))
> >  
> > +#define GOMP_MAP_ALWAYS_P(X) \
> > +  (((X) == GOMP_MAP_ALWAYS_TO) || ((X) == GOMP_MAP_ALWAYS_FROM) \
> > +   || ((X) == GOMP_MAP_ALWAYS_TOFROM))
> 
> You could simplify this e.g. to
>   (((X) == GOMP_MAP_ALWAYS_TO) || GOMP_MAP_ALWAYS_FROM_P (X))
> or
>   (GOMP_MAP_ALWAYS_TO_P (X) || ((X) == GOMP_MAP_ALWAYS_FROM))
> 
> Otherwise, LGTM.

Done.  Here is what I committed:


gcc/
* gimplify.c (gimplify_scan_omp_clauses): Use GOMP_MAP_ALWAYS_P.
(gimplify_adjust_omp_clauses): Likewise.
include/
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_2): Define.
(GOMP_MAP_FLAG_ALWAYS): Remove.
(enum gomp_map_kind): Use GOMP_MAP_FLAG_SPECIAL_2 instead of
GOMP_MAP_FLAG_ALWAYS for GOMP_MAP_ALWAYS_TO, GOMP_MAP_ALWAYS_FROM,
GOMP_MAP_ALWAYS_TOFROM, GOMP_MAP_STRUCT, GOMP_MAP_RELEASE.
(GOMP_MAP_ALWAYS_P): Define.


diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ee5cb95..a308307 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6613,7 +6613,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  struct_map_to_clause->put (decl, *list_p);
  list_p = &OMP_CLAUSE_CHAIN (*list_p);
  flags = GOVD_MAP | GOVD_EXPLICIT;
- if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
+ if (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)))
flags |= GOVD_SEEN;
  goto do_add_decl;
}
@@ -6623,7 +6623,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  tree *sc = NULL, *pt = NULL;
  if (!ptr && TREE_CODE (*osc) == TREE_LIST)
osc = &TREE_PURPOSE (*osc);
- if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
+ if (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)))
n->value |= GOVD_SEEN;
  offset_int o1, o2;
  if (offset)
@@ -7363,8 +7363,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree 
*list_p,
  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
  if ((ctx->region_type & ORT_TARGET) != 0
  && !(n->value & GOVD_SEEN)
- && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0
- || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT))
+ && GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0)
{
  remove = true;
  /* For struct element mapping, if struct is never referenced
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index f834dec..008a4a4 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -39,10 +39,9 @@
 /* Special map kinds, enumerated starting here.  */
 #define GOMP_MAP_FLAG_SPECIAL_0(1 << 2)
 #define GOMP_MAP_FLAG_SPECIAL_1(1 << 3)
+#define GOMP_MAP_FLAG_SPECIAL_2(1 << 4)
 #define GOMP_MAP_FLAG_SPECIAL  (GOMP_MAP_FLAG_SPECIAL_1 \
 | GOMP_MAP_FLAG_SPECIAL_0)
-/* OpenMP always flag.  */
-#define GOMP_MAP_FLAG_ALWAYS   (1 << 6)
 /* Flag to force a specific behavior (or else, trigger a run-time error).  */
 #define GOMP_MAP_FLAG_FORCE(1 << 7)
 
@@ -95,29 +94,31 @@ enum gomp_map_kind
 GOMP_MAP_FORCE_TOFROM =

Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2015-10-26 Thread Ilya Verbin
On Fri, Oct 23, 2015 at 10:10:06 +0200, Jakub Jelinek wrote:
> On Thu, Oct 22, 2015 at 09:26:37PM +0300, Ilya Verbin wrote:
> > On Mon, Dec 22, 2014 at 13:01:40 +0100, Thomas Schwinge wrote:
> > > By chance (when tracking down a different problem), I've found the
> > > following.  Would you please check whether that's a real problem in
> > > liboffloadmic, or its libgomp plugin, or just a mis-diagnosis by
> > > Valgrind?
> > > 
> > > ==21327== Syscall param write(buf) points to uninitialised byte(s)
> > 
> > Finally we have investigated this :)  Valgrind warns about uninitialized 
> > bytes,
> > inserted into the struct for alignment.  It's possible to avoid the warning 
> > by
> > the patch bellow.  Should I commit it, or just leave it as is?
> 
> Or use calloc instead of malloc, or add two uint8_t padding fields after the
> two uint8_t fields and initialize them too.  Though, as you have some
> padding after the name, I think calloc is best.

Here is what I committed to trunk together with an obvious change.


liboffloadmic/
* runtime/offload_host.cpp (OffloadDescriptor::setup_misc_data): Use
calloc instead of malloc.
(__offload_fini_library): Set mic_engines_total to zero.


diff --git a/liboffloadmic/runtime/offload_host.cpp 
b/liboffloadmic/runtime/offload_host.cpp
index c6c6518..a150410 100644
--- a/liboffloadmic/runtime/offload_host.cpp
+++ b/liboffloadmic/runtime/offload_host.cpp
@@ -2424,8 +2424,8 @@ bool OffloadDescriptor::setup_misc_data(const char *name)
 }
 
 // initialize function descriptor
-m_func_desc = (FunctionDescriptor*) malloc(m_func_desc_size +
-   misc_data_size);
+m_func_desc = (FunctionDescriptor*) calloc(1, m_func_desc_size
+ + misc_data_size);
 if (m_func_desc == NULL)
   LIBOFFLOAD_ERROR(c_malloc);
 m_func_desc->console_enabled = console_enabled;
@@ -5090,6 +5090,7 @@ static void __offload_fini_library(void)
 OFFLOAD_DEBUG_TRACE(2, "Cleanup offload library ...\n");
 if (mic_engines_total > 0) {
 delete[] mic_engines;
+mic_engines_total = 0;
 
 if (mic_proxy_fs_root != 0) {
 free(mic_proxy_fs_root);


  -- Ilya


Re: [gomp4.1] Handle new form of #pragma omp declare target

2015-10-26 Thread Ilya Verbin
On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> As the testcases show, #pragma omp declare target has now a new form (well,
> two; with some issues on it pending), where it is used just as a single
> declarative directive rather than a pair of them and allows marking
> vars and functions by name as "omp declare target" vars/functions (which the
> middle-end etc. already handles), but also "omp declare target link", which
> is a deferred var, that is not initially mapped (on devices without shared
> memory with host), but has to be mapped explicitly.

I don't quite understand how link should work.  OpenMP 4.5 says:

"The list items of a link clause are not mapped by the declare target directive.
Instead, their mapping is deferred until they are mapped by target data or
target constructs. They are mapped only for such regions."

But doesn't this mean that the example bellow should work identically
with/without USE_LINK defined?  Or is there some difference on other testcases?

int a = 1;

#ifdef USE_LINK
#pragma omp declare target link(a)
#endif

int main ()
{
  a = 2;
  int res;
  #pragma omp target map(to: a) map(from: res)
res = a;
  return res;
}

> This patch only marks them with the new attribute, the actual middle-end
> implementation needs to be implemented.
> 
> I believe OpenACC has something similar, but no idea if it is already
> implemented.
> 
> Anyway, I think the implementation should be that in some pass running on
> the ACCEL_COMPILER side (guarded by separate address space aka non-HSA)

HSA does not define ACCEL_COMPILER, because it uses only one compiler.

> we actually replace the variables with pointers to variables, then need
> to somehow also mark those in the offloading tables, so that the library

I see 2 possible options: use the MSB of the size, or introduce the third field
for flags.

> registers them (the locations of the pointers to the vars), but also marks
> them for special treatment, and then when actually trying to map them
> (or their parts, guess that needs to be discussed) we allocate them or
> whatever is requested and store the device pointer into the corresponding
> variable.
> 
> Ilya, Thomas, thoughts on this?

  -- Ilya


Re: [gomp4.1] Handle new form of #pragma omp declare target

2015-10-26 Thread Ilya Verbin
On Mon, Oct 26, 2015 at 20:05:39 +0100, Jakub Jelinek wrote:
> On Mon, Oct 26, 2015 at 09:35:52PM +0300, Ilya Verbin wrote:
> > On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > > As the testcases show, #pragma omp declare target has now a new form 
> > > (well,
> > > two; with some issues on it pending), where it is used just as a single
> > > declarative directive rather than a pair of them and allows marking
> > > vars and functions by name as "omp declare target" vars/functions (which 
> > > the
> > > middle-end etc. already handles), but also "omp declare target link", 
> > > which
> > > is a deferred var, that is not initially mapped (on devices without shared
> > > memory with host), but has to be mapped explicitly.
> > 
> > I don't quite understand how link should work.  OpenMP 4.5 says:
> > 
> > "The list items of a link clause are not mapped by the declare target 
> > directive.
> > Instead, their mapping is deferred until they are mapped by target data or
> > target constructs. They are mapped only for such regions."
> >
> > But doesn't this mean that the example bellow should work identically
> > with/without USE_LINK defined?  Or is there some difference on other 
> > testcases?
> 
> On your testcase, the end result is pretty much the same, the variable is
> not mapped initially to the device, and at the beginning of omp target it is
> mapped to device, at the end of the region it is unmapped from the device
> (without copying back).
> 
> But consider:
> 
> int a = 1, b = 1;
> #pragma omp declare target link (a) to (b)
> int
> foo (void)
> {
>   return a++ + b++;
> }
> #pragma omp declare target to (foo)
> int
> main ()
> {
>   a = 2;
>   b = 2;
>   int res;
>   #pragma omp target map (to: a, b) map (from: res)
>   {
> res = foo () + foo ();
>   }
>   // This assumes only non-shared address space, so would need to be guarded
>   // for that.
>   if (res != (2 + 1) + (3 + 2))
> __builtin_abort ();
>   return 0;
> }
> 
> Without declare target link or to, you can't use the global variables
> in orphaned accelerated routines (unless you e.g. take the address of the
> mapped variable in the region and pass it around).
> The to variables (non-deferred) are always mapped and are initialized with
> the original initializer, refcount is infinity.  link (deferred) work more
> like the normal mapping, referencing those vars when they aren't explicitly
> (or implicitly) mapped is unspecified behavior, if it is e.g. mapped freshly
> with to kind, it gets the current value of the host var rather than the
> original one.  But, beyond the mapping the compiler needs to ensure that
> all uses of the link global var (or perhaps just all uses of the link global
> var outside of the target construct body where it is mapped, because you
> could use there the pointer you got from GOMP_target) are replaced by
> dereference of some artificial pointer, so a becomes *a_tmp and &a becomes
> &*a_tmp, and that the runtime library during registration of the tables is
> told about the address of this artificial pointer.  During registration,
> I'd expect it would stick an entry for this range into the table, with some
> special flag or something similar, indicating that it is deferred mapping
> and where the offloading device pointer is.  During mapping, it would map it
> as any other not yet mapped object, but additionally would also set this
> device pointer to the device address of the mapped object.  We also need to
> ensure that when we drop the refcount of that mapping back to 0, we get it
> back to the state where it is described as a range with registered deferred
> mapping and where the device pointer is.

Ok, got it, I'll try implement this...

> > > we actually replace the variables with pointers to variables, then need
> > > to somehow also mark those in the offloading tables, so that the library
> > 
> > I see 2 possible options: use the MSB of the size, or introduce the third 
> > field
> > for flags.
> 
> Well, it can be either recorded in the host variable tables (which contain
> address and size pair, right), or in corresponding offloading device table
> (which contains the pointer, something else?).

It contains a size too, which is checked in libgomp:
  gomp_fatal ("Can't map target variables (size mismatch)");
Yes, we can remove this check, and use second field in device table for flags.

  -- Ilya


Re: [gomp4.1] Handle new form of #pragma omp declare target

2015-10-27 Thread Ilya Verbin
On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> As the testcases show, #pragma omp declare target has now a new form (well,
> two; with some issues on it pending), where it is used just as a single
> declarative directive rather than a pair of them and allows marking
> vars and functions by name as "omp declare target" vars/functions (which the
> middle-end etc. already handles),

There is an issue - such variables are not added to the offloading tables,
because when varpool_node::get_create is called for the first time, the variable
doesn't yet have "omp declare target" attribute, and when it's called for the
second time, it just returns existing node.  Functions also aren't marked as
offloadable.  I tried to fix this by moving the code from
varpool_node::get_create to varpool_node::finalize_decl, but it helped only C,
but doesn't fix C++.  Therefore, I decided to iterate through all functions and
variables, like in the patch bellow.  But it doesn't work for static vars,
declared inside functions, because they do not appear in symtab :(


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 1a64d789..0ba04ef 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -511,16 +511,6 @@ cgraph_node::create (tree decl)
   gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
 
   node->decl = decl;
-
-  if ((flag_openacc || flag_openmp)
-  && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
-{
-  node->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
-  g->have_offload = true;
-#endif
-}
-
   node->register_symbol ();
 
   if (DECL_CONTEXT (decl) && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 04a4d3f..9ac7b36 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1016,6 +1016,25 @@ analyze_functions (bool first_time)
   symtab->state = CONSTRUCTION;
   input_location = UNKNOWN_LOCATION;
 
+  /* Process offloadable functions and variables.  */
+  if (first_time && (flag_openacc || flag_openmp))
+FOR_EACH_SYMBOL (node)
+  if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES 
(node->decl)))
+   {
+ node->offloadable = 1;
+
+#ifdef ENABLE_OFFLOADING
+ g->have_offload = true;
+
+ if (TREE_CODE (node->decl) == VAR_DECL && !DECL_EXTERNAL (node->decl))
+   {
+ if (!in_lto_p)
+   vec_safe_push (offload_vars, node->decl);
+ node->force_output = 1;
+   }
+#endif
+   }
+
   /* Ugly, but the fixup can not happen at a time same body alias is created;
  C++ FE is confused about the COMDAT groups being right.  */
   if (symtab->cpp_implicit_aliases_done)
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 7d11e20..077dd40 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -154,19 +154,6 @@ varpool_node::get_create (tree decl)
 
   node = varpool_node::create_empty ();
   node->decl = decl;
-
-  if ((flag_openacc || flag_openmp) && !DECL_EXTERNAL (decl)
-  && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
-{
-  node->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
-  g->have_offload = true;
-  if (!in_lto_p)
-   vec_safe_push (offload_vars, decl);
-  node->force_output = 1;
-#endif
-}
-
   node->register_symbol ();
   return node;
 }
diff --git a/libgomp/testsuite/libgomp.c++/target-13.C 
b/libgomp/testsuite/libgomp.c++/target-13.C
index 376672d..5279ac0 100644
--- a/libgomp/testsuite/libgomp.c++/target-13.C
+++ b/libgomp/testsuite/libgomp.c++/target-13.C
@@ -1,11 +1,14 @@
 extern "C" void abort (void);
 
+int g;
+#pragma omp declare target (g)
+
 #pragma omp declare target
 int
 foo (void)
 {
   static int s;
-  return ++s;
+  return ++s + g;
 }
 #pragma omp end declare target
 
diff --git a/libgomp/testsuite/libgomp.c/target-28.c 
b/libgomp/testsuite/libgomp.c/target-28.c
index c9a2999..96e9e05 100644
--- a/libgomp/testsuite/libgomp.c/target-28.c
+++ b/libgomp/testsuite/libgomp.c/target-28.c
@@ -1,11 +1,14 @@
 extern void abort (void);
 
+int g;
+#pragma omp declare target (g)
+
 #pragma omp declare target
 int
 foo (void)
 {
   static int s;
-  return ++s;
+  return ++s + g;
 }
 #pragma omp end declare target
 
 
  -- Ilya


Re: [gomp4.1] Handle new form of #pragma omp declare target

2015-10-30 Thread Ilya Verbin
On Wed, Oct 28, 2015 at 00:11:03 +0300, Ilya Verbin wrote:
> On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > As the testcases show, #pragma omp declare target has now a new form (well,
> > two; with some issues on it pending), where it is used just as a single
> > declarative directive rather than a pair of them and allows marking
> > vars and functions by name as "omp declare target" vars/functions (which the
> > middle-end etc. already handles),
> 
> There is an issue - such variables are not added to the offloading tables,
> because when varpool_node::get_create is called for the first time, the 
> variable
> doesn't yet have "omp declare target" attribute, and when it's called for the
> second time, it just returns existing node.  Functions also aren't marked as
> offloadable.  I tried to fix this by moving the code from
> varpool_node::get_create to varpool_node::finalize_decl, but it helped only C,
> but doesn't fix C++.  Therefore, I decided to iterate through all functions 
> and
> variables, like in the patch bellow.  But it doesn't work for static vars,
> declared inside functions, because they do not appear in symtab :(

Ping?  Where should I set node->offloadable for "omp declare target to (list)"
functions, global and static vars?

Thanks,
  -- Ilya


Re: [gomp4.1] Handle new form of #pragma omp declare target

2015-11-02 Thread Ilya Verbin
On Fri, Oct 30, 2015 at 20:12:25 +0100, Jakub Jelinek wrote:
> On Fri, Oct 30, 2015 at 08:44:07PM +0300, Ilya Verbin wrote:
> > On Wed, Oct 28, 2015 at 00:11:03 +0300, Ilya Verbin wrote:
> > > On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > > > As the testcases show, #pragma omp declare target has now a new form 
> > > > (well,
> > > > two; with some issues on it pending), where it is used just as a single
> > > > declarative directive rather than a pair of them and allows marking
> > > > vars and functions by name as "omp declare target" vars/functions 
> > > > (which the
> > > > middle-end etc. already handles),
> > > 
> > > There is an issue - such variables are not added to the offloading tables,
> > > because when varpool_node::get_create is called for the first time, the 
> > > variable
> > > doesn't yet have "omp declare target" attribute, and when it's called for 
> > > the
> > > second time, it just returns existing node.  Functions also aren't marked 
> > > as
> > > offloadable.  I tried to fix this by moving the code from
> > > varpool_node::get_create to varpool_node::finalize_decl, but it helped 
> > > only C,
> > > but doesn't fix C++.  Therefore, I decided to iterate through all 
> > > functions and
> > > variables, like in the patch bellow.  But it doesn't work for static vars,
> > > declared inside functions, because they do not appear in symtab :(
> > 
> > Ping?  Where should I set node->offloadable for "omp declare target to 
> > (list)"
> > functions, global and static vars?
> 
> Perhaps already somewhere in the FEs?  I mean, when the varpool node is
> created after the decl has that attribute, it already should set offsetable
> itself, so perhaps when adding the attribute check if corresponding varpool
> node exists already (but don't create it) and if yes, set offloadable?

Here is the patch.
make check RUNTESTFLAGS=gomp.exp and check-target-libgomp passed.
OK for gomp-4_5-branch?


gcc/c/
* c-parser.c: Include context.h.
(c_parser_omp_declare_target): If decl has "omp declare target" or
"omp declare target link" attribute, and cgraph or varpool node already
exists, then set corresponding flags.
gcc/cp/
* parser.c: Include context.h.
(cp_parser_omp_declare_target): If decl has "omp declare target" or
"omp declare target link" attribute, and cgraph or varpool node already
exists, then set corresponding flags.
libgomp/
* testsuite/libgomp.c++/target-13.C: Add global variable with "omp
declare target ()" directive, use it in foo.
* testsuite/libgomp.c/target-28.c: Likewise.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index a169457..049417c 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gomp-constants.h"
 #include "c-family/c-indentation.h"
 #include "gimple-expr.h"
+#include "context.h"
 
 
 /* Initialization routine for this file.  */
@@ -15600,7 +15601,22 @@ c_parser_omp_declare_target (c_parser *parser)
  continue;
}
   if (!at1)
-   DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+   {
+ symtab_node *node = symtab_node::get (t);
+ DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+ if (node != NULL)
+   {
+ node->offloadable = 1;
+#ifdef ENABLE_OFFLOADING
+ g->have_offload = true;
+ if (is_a  (node))
+   {
+ vec_safe_push (offload_vars, t);
+ node->force_output = 1;
+   }
+#endif
+   }
+   }
 }
 }
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a374e6c..de77a4b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-low.h"
 #include "gomp-constants.h"
 #include "c-family/c-indentation.h"
+#include "context.h"
 
 
 /* The lexer.  */
@@ -34773,7 +34774,22 @@ cp_parser_omp_declare_target (cp_parser *parser, 
cp_token *pragma_tok)
  continue;
}
   if (!at1)
-   DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+   {
+ symtab_node *node = symtab_node::get (t);
+ DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+ if (node != NULL)
+   {
+ node->offloadable = 1;
+#ifdef ENABLE_OFFLOADING
+ g->have_off

Re: [ptx] partitioning optimization

2015-11-10 Thread Ilya Verbin
> I've been unable to introduce a testcase for this. The difficulty is we want
> to check an rtl dump from the acceleration compiler, and there doesn't
> appear to be existing machinery for that in the testsuite.  Perhaps
> something to be added later?

I haven't tried it, but doesn't
/* { dg-options "-foffload=-fdump-rtl-..." } */
with
/* { dg-final { scan-rtl-dump ... } } */
work?

  -- Ilya


Re: [gomp4.5] depend nowait support for target

2015-11-12 Thread Ilya Verbin
On Wed, Nov 11, 2015 at 17:52:22 +0100, Jakub Jelinek wrote:
> On Mon, Oct 19, 2015 at 10:47:54PM +0300, Ilya Verbin wrote:
> > So, here is what I have for now.  Attached target-29.c testcase works fine 
> > with
> > MIC emul, however I don't know how to (and where) properly check for 
> > completion
> > of async execution on target.  And, similarly, where to do unmapping after 
> > that?
> > Do we need a callback from plugin to libgomp (as far as I understood, PTX
> > runtime supports this, but HSA doesn't), or libgomp will just check for
> > ttask->is_completed in task.c?
> 
> Here is the patch updated to have a task.c defined function that the plugin
> can call upon completion of async offloading exection.

Thanks.

> The testsuite coverage will need to improve, the testcase is wrong
> (contains data races - if you want to test parallel running of two target
> regions that both touch the same var, I'd say best would be to use
> #pragma omp atomic and or in 4 in one case and 1 in another case, then
> test if result is 5 (and similarly for the other var).
> Also, with the usleeps Alex Monakov will be unhappy because PTX newlib does
> not have it, but we'll need to find some solution for that.
> 
> Another thing to work on beyond testsuite coverage (it is desirable to test
> nowait target tasks (both depend and without depend) being awaited in all
> the various waiting spots, i.e. end of parallel, barrier, taskwait, end of
> taskgroup, or if (0) task with depend clause waiting on that.
> 
> Also, I wonder what to do if #pragma omp target nowait is used outside of
> (host) parallel - when team is NULL.  All the tasking code in that case just
> executes tasks undeferred, which is fine for all but target nowait - there
> it is I'd say useful to be able to run a single host thread concurrently
> with some async offloading tasks.  So, I wonder if in that case,
> if we encounter target nowait with team == NULL, should not just create a
> dummy non-active (nthreads == 1) team, as if there was #pragma omp parallel
> if (0) starting above it and ending at program's end.  In OpenMP, the
> program's initial thread is implicitly surrounded by inactive parallel, so
> this isn't anything against the OpenMP execution model.  But we'd need to
> free the team somewhere in a destructor.
>
> Can you please try to cleanup the liboffloadmic side of this, so that
> a callback instead of hardcoded __gomp_offload_intelmic_async_completed call
> is used?

Do you mean something like the patch bellow?  I'll discuss it with liboffloadmic
maintainers.

> Can you make sure it works on XeonPhi non-emulated too?

I'm trying to do it, but it will take some time...

Unfortunately, target-32.c fails for me using emulation mode:

Program received signal SIGSEGV, Segmentation fault.
#0  0x7ff4ab1265ed in priority_list_remove (list=0x0, node=0x7ff49001afa0, 
model=MEMMODEL_RELAXED) at libgomp/priority_queue.h:422
#1  0x7ff4ab1266d9 in priority_tree_remove (type=PQ_CHILDREN, 
head=0x1883138, node=0x7ff49001afa0) at libgomp/priority_queue.c:195
#2  0x7ff4ab10fa06 in priority_queue_remove (type=PQ_CHILDREN, 
head=0x1883138, task=0x7ff49001af30, model=MEMMODEL_RELAXED) at 
libgomp/priority_queue.h:468
#3  0x7ff4ab11570d in gomp_task_maybe_wait_for_dependencies 
(depend=0x7ff49b0d9de0) at libgomp/task.c:1539
#4  0x7ff4ab11fd46 in GOMP_target_enter_exit_data (device=-1, mapnum=3, 
hostaddrs=0x7ff49b0d9dc0, sizes=0x6020b0 <.omp_data_sizes.38>, kinds=0x6020a0 
<.omp_data_kinds.39>, flags=2, depend=0x7ff49b0d9de0) at libgomp/target.c:1662
#5  0x004011f9 in main._omp_fn ()
#6  0x7ff4ab1160f3 in gomp_thread_start (xdata=0x7fffe93766a0) at 
libgomp/team.c:119
#7  0x003b07e07ee5 in start_thread () from /lib64/libpthread.so.0
#8  0x003b076f4b8d in clone () from /lib64/libc.so.6

However when I manually run commands from testsuite/libgomp.log under the same
environment, it passes.  Don't know where is the difference.

Also I tried to replace 'b = 4;' and 'b = 5;' with infinite loops, but got only
100% CPU usage in offload_target_main instead of 200%, so it seems that only one
target task is running concurrently.


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 6da09b1..772e198 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -220,6 +220,10 @@ static void
 register_main_image ()
 {
   __offload_register_image (&main_target_image);
+
+  /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
+ asynchronous task on target is completed.  */
+  __offload_register_task_callback (GOMP_PLUGIN_target_task_compl

Re: [gomp4.5] depend nowait support for target

2015-11-12 Thread Ilya Verbin
On Thu, Nov 12, 2015 at 18:58:22 +0100, Jakub Jelinek wrote:
> > Unfortunately, target-32.c fails for me using emulation mode:
> 
> I haven't managed to get it stuck yet (unlike the target-33.c one, see
> another mail), what OMP_NUM_THREADS you are using
> and how many cores/threads?

OMP_NUM_THREADS isn't set.  40 cores.

  -- Ilya


Re: [gomp4.5] depend nowait support for target

2015-11-12 Thread Ilya Verbin
On Thu, Nov 12, 2015 at 18:45:09 +0100, Jakub Jelinek wrote:
> But the testcase I wrote (target-33.c) hangs, the problem is in the
>   #pragma omp target nowait map (tofrom: a, b) depend(out: d[3])
>   {
> #pragma omp atomic update
> a = a + 9;
> b -= 8;
>   }
>   #pragma omp target nowait map (tofrom: a, c) depend(out: d[4])
>   {
> #pragma omp atomic update
> a = a + 4;
> c >>= 1;
>   }
>   #pragma omp task if (0) depend (in: d[3], d[4])
>   if (a != 50 || b != 4 || c != 20)
> abort ();
> part, where (I should change that for the case of no dependencies
> eventually) the task with map_vars+async_run is queued in both cases,
> then we reach GOMP_task, which calls gomp_task_maybe_wait_for_dependencies
> which spawns the first half task (map_vars+async_run), and then
> the second half task (map_vars+async_run), but that one gets stuck somewhere
> in liboffloadmic, then some other thread (from liboffloadmic) calls
> GOMP_PLUGIN_target_task_completion and enqueues the second half of the first
> target task (unmap_vars), but as the only normal thread in the main program
> is stuck in liboffloadmic (during gomp_map_vars, trying to allocate
> target memory in the plugin), there is no thread to schedule the second half
> of first target task.  So, if liboffloadmic is stuck waiting for unmap_vars,
> it is a deadlock.  Can you please try to debug this?

I'm unable to reproduce the hang (have tried various values of OMP_NUM_THREADS).
The testcase just aborts at (a != 50 || b != 4 || c != 20), because
a == 37, b == 12, c == 40.

BTW, don't know is this a bug or not:
Conditional jump or move depends on uninitialised value(s)
   at 0x4C2083D: priority_queue_insert (priority_queue.h:347)
   by 0x4C24DF9: GOMP_PLUGIN_target_task_completion (task.c:678)

  -- Ilya


Re: [gomp4.5] depend nowait support for target

2015-11-13 Thread Ilya Verbin
On Fri, Nov 13, 2015 at 16:11:50 +0100, Jakub Jelinek wrote:
> On Fri, Nov 13, 2015 at 11:18:41AM +0100, Jakub Jelinek wrote:
> > For the offloading case, I actually see a problematic spot, namely that
> > GOMP_PLUGIN_target_task_completion could finish too early, and get the
> > task_lock before the thread that run the gomp_target_task_fn doing map_vars
> > + async_run for it.  Bet I need to add further ttask state kinds and deal
> > with that case (so GOMP_PLUGIN_target_task_completion would just take the
> > task lock and tweak ttask state if it has not been added to the queues
> > yet).
> > Plus I think I want to improve the case where we are not waiting, in
> > gomp_create_target_task if not waiting for dependencies actually schedule
> > manually the gomp_target_task_fn.
> 
> These two have been resolved, plus target-34.c issue resolved too (the bug
> was that I've been too lazy and just put target-33.c test into #pragma omp
> parallel #pragma omp single, but that is invalid OpenMP, as single is a
> worksharing region and #pragma omp barrier may not be encountered in such a
> region.  Fixed by rewriting the testcase.
> 
> So here is a full patch that passes for me both non-offloading and
> offloading, OMP_NUM_THREADS=16 (implicit on my box) as well as
> OMP_NUM_THREADS=1 (explicit).  I've incorporated your incremental patch.
> 
> One option to avoid the static variable would be to pass two pointers
> instead of one (async_data), one would be the callback function pointer,
> another argument to it.  Or another possibility would be to say that
> the async_data argument the plugin passes to liboffloadmic would be
> pointer to structure, holding a function pointer (completion callback)
> and the data pointer to pass to it, and then the plugin would just
> GOMP_PLUGIN_malloc 2 * sizeof (void *) for it, fill it in and
> register some function in itself that would call the
> GOMP_PLUGIN_target_task_completion with the second structure element
> as argument and then free the structure pointer.

I don't know which interface to implement to maintain compatibility in the
future.
Anyway, currently it's impossible that a process will use the same liboffloadmic
for 2 different offloading paths (say GCC's in exec and ICC's in a dso), because
in fact GCC's and ICC's libraries are not the same.  First of all, they have
different names: liboffloadmic in GCC and just liboffload in ICC.  And most
importantly, ICC's version contains some references to libiomp5, which were
removed form GCC's version.  In theory, we want to use one library with all
compilers, but I'm not sure when it will be possible.

> Do you get still crashes on any of the testcases with this?

No, all tests now pass using emul.  I'll report when I have any results on HW.

Thanks,
  -- Ilya


Re: [gomp4.5] depend nowait support for target

2015-11-13 Thread Ilya Verbin
On Fri, Nov 13, 2015 at 17:41:53 +0100, Jakub Jelinek wrote:
> On Fri, Nov 13, 2015 at 07:37:17PM +0300, Ilya Verbin wrote:
> > I don't know which interface to implement to maintain compatibility in the
> > future.
> > Anyway, currently it's impossible that a process will use the same 
> > liboffloadmic
> > for 2 different offloading paths (say GCC's in exec and ICC's in a dso), 
> > because
> > in fact GCC's and ICC's libraries are not the same.  First of all, they have
> > different names: liboffloadmic in GCC and just liboffload in ICC.  And most
> > importantly, ICC's version contains some references to libiomp5, which were
> > removed form GCC's version.  In theory, we want to use one library with all
> > compilers, but I'm not sure when it will be possible.
> 
> Ok, in that case it is less of a problem.
> 
> > > Do you get still crashes on any of the testcases with this?
> > 
> > No, all tests now pass using emul.  I'll report when I have any results on 
> > HW.
> 
> Perfect, I'll commit it to gomp-4_5-branch then.

make check-target-libgomp with offloading to HW also passed :)

And this:

+++ b/libgomp/testsuite/libgomp.c/target-32.c
@@ -3,6 +3,7 @@
 
 int main ()
 {
+  int x = 1;
   int a = 0, b = 0, c = 0, d[7];
 
   #pragma omp parallel
@@ -18,6 +19,7 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[3])
 {
+  while (x);
   usleep (1000);
   #pragma omp atomic update
   b |= 4;
@@ -25,6 +27,7 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[4])
 {
+  while (x);
   usleep (5000);
   #pragma omp atomic update
   b |= 1;

demonstrates 200% CPU usage both using emul and HW, so 2 target tasks really run
concurrently.

  -- Ilya


[gomp4.5] Handle #pragma omp declare target link

2015-11-16 Thread Ilya Verbin
Hi!

On Mon, Oct 26, 2015 at 20:49:40 +0100, Jakub Jelinek wrote:
> On Mon, Oct 26, 2015 at 10:39:04PM +0300, Ilya Verbin wrote:
> > > Without declare target link or to, you can't use the global variables
> > > in orphaned accelerated routines (unless you e.g. take the address of the
> > > mapped variable in the region and pass it around).
> > > The to variables (non-deferred) are always mapped and are initialized with
> > > the original initializer, refcount is infinity.  link (deferred) work more
> > > like the normal mapping, referencing those vars when they aren't 
> > > explicitly
> > > (or implicitly) mapped is unspecified behavior, if it is e.g. mapped 
> > > freshly
> > > with to kind, it gets the current value of the host var rather than the
> > > original one.  But, beyond the mapping the compiler needs to ensure that
> > > all uses of the link global var (or perhaps just all uses of the link 
> > > global
> > > var outside of the target construct body where it is mapped, because you
> > > could use there the pointer you got from GOMP_target) are replaced by
> > > dereference of some artificial pointer, so a becomes *a_tmp and &a becomes
> > > &*a_tmp, and that the runtime library during registration of the tables is
> > > told about the address of this artificial pointer.  During registration,
> > > I'd expect it would stick an entry for this range into the table, with 
> > > some
> > > special flag or something similar, indicating that it is deferred mapping
> > > and where the offloading device pointer is.  During mapping, it would map 
> > > it
> > > as any other not yet mapped object, but additionally would also set this
> > > device pointer to the device address of the mapped object.  We also need 
> > > to
> > > ensure that when we drop the refcount of that mapping back to 0, we get it
> > > back to the state where it is described as a range with registered 
> > > deferred
> > > mapping and where the device pointer is.
> > 
> > Ok, got it, I'll try implement this...
> 
> Thanks.
> 
> > > > > we actually replace the variables with pointers to variables, then 
> > > > > need
> > > > > to somehow also mark those in the offloading tables, so that the 
> > > > > library
> > > > 
> > > > I see 2 possible options: use the MSB of the size, or introduce the 
> > > > third field
> > > > for flags.
> > > 
> > > Well, it can be either recorded in the host variable tables (which contain
> > > address and size pair, right), or in corresponding offloading device table
> > > (which contains the pointer, something else?).
> > 
> > It contains a size too, which is checked in libgomp:
> >   gomp_fatal ("Can't map target variables (size mismatch)");
> > Yes, we can remove this check, and use second field in device table for 
> > flags.
> 
> Yeah, or e.g. just use MSB of that size (so check that either the size is
> the same (then it is target to) or it is MSB | size (then it is target link).
> Objects larger than half of the address space aren't really supportable
> anyway.

Here is WIP patch, not for check-in.  There are still many FIXMEs, which I am
going to resolve, however target-link-1.c testcase pass.
Is this approach correct?  Any comments on FIXMEs?


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 23d0107..58771c0 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15895,7 +15895,10 @@ c_parser_omp_declare_target (c_parser *parser)
  g->have_offload = true;
  if (is_a  (node))
{
- vec_safe_push (offload_vars, t);
+ omp_offload_var var;
+ var.decl = t;
+ var.link_ptr_decl = NULL_TREE;
+ vec_safe_push (offload_vars, var);
  node->force_output = 1;
}
 #endif
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d1f4970..b890f6d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -34999,7 +34999,10 @@ cp_parser_omp_declare_target (cp_parser *parser, 
cp_token *pragma_tok)
  g->have_offload = true;
  if (is_a  (node))
{
- vec_safe_push (offload_vars, t);
+ omp_offload_var var;
+ var.decl = t;
+ var.link_ptr_decl = NULL_TREE;
+ vec_safe_push (offload_vars, var);
  node->force_output = 1;
}
 #endif
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 

Re: [PATCH] Implement GOMP_OFFLOAD_unload_image in intelmic plugin

2015-11-16 Thread Ilya Verbin
On Tue, Sep 08, 2015 at 22:41:17 +0300, Ilya Verbin wrote:
> This patch supports unloading of target images from the device.
> Unfortunately __offload_unregister_image requires the whole descriptor for
> unloading, which must contain target code inside, for this reason the plugin
> keeps descriptors for all offloaded images in memory.
> Also the patch removes useless variable names, intended for debug purposes.
> Regtested with make check-target-libgomp and using a dlopen/dlclose test.
> OK for trunk?
> 
> liboffloadmic/
>   * plugin/libgomp-plugin-intelmic.cpp (struct TargetImageDesc): New.
>   (ImgDescMap): New typedef.
>   (image_descriptors): New static var.
>   (init): Allocate image_descriptors.
>   (offload): Remove vars2 argument.  Pass NULL to __offload_offload1
>   instead of vars2.
>   (unregister_main_image): New static function.
>   (register_main_image): Call unregister_main_image at exit.
>   (GOMP_OFFLOAD_init_device): Print device number, fix offload args.
>   (GOMP_OFFLOAD_fini_device): Likewise.
>   (get_target_table): Remove vd1g and vd2g, don't pass them to offload.
>   (offload_image): Remove declaration of the struct TargetImage.
>   Free table.  Insert new descriptor into image_descriptors.
>   (GOMP_OFFLOAD_unload_image): Call __offload_unregister_image, free
>   the corresponding descriptor, and remove it from address_table and
>   image_descriptors.
>   (GOMP_OFFLOAD_alloc): Print device number, remove vd1g.
>   (GOMP_OFFLOAD_free): Likewise.
>   (GOMP_OFFLOAD_host2dev): Print device number, remove vd1g and vd2g.
>   (GOMP_OFFLOAD_dev2host): Likewise.
>   (GOMP_OFFLOAD_run): Print device number, remove vd1g.
>   * plugin/offload_target_main.cpp (__offload_target_table_p1): Remove
>   vd2, don't pass it to __offload_target_enter.
>   (__offload_target_table_p2): Likewise.
>   (__offload_target_alloc): Likewise.
>   (__offload_target_free): Likewise.
>   (__offload_target_host2tgt_p1): Likewise.
>   (__offload_target_host2tgt_p2): Likewise.
>   (__offload_target_tgt2host_p1): Likewise.
>   (__offload_target_tgt2host_p2): Likewise.
>   (__offload_target_run): Likewise.

Ping?  Rebased and retested.


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 772e198..6ee585e 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -65,6 +65,17 @@ typedef std::vector DevAddrVect;
 /* Addresses for all images and all devices.  */
 typedef std::map ImgDevAddrMap;
 
+/* Image descriptor needed by __offload_[un]register_image.  */
+struct TargetImageDesc {
+  int64_t size;
+  /* 10 characters is enough for max int value.  */
+  char name[sizeof ("lib00.so")];
+  char data[];
+} __attribute__ ((packed));
+
+/* Image descriptors, indexed by a pointer obtained from libgomp.  */
+typedef std::map ImgDescMap;
+
 
 /* Total number of available devices.  */
 static int num_devices;
@@ -76,6 +87,9 @@ static int num_images;
second key is number of device.  Contains a vector of pointer pairs.  */
 static ImgDevAddrMap *address_table;
 
+/* Descriptors of all images, registered in liboffloadmic.  */
+static ImgDescMap *image_descriptors;
+
 /* Thread-safe registration of the main image.  */
 static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
 
@@ -156,6 +170,7 @@ init (void)
 
 out:
   address_table = new ImgDevAddrMap;
+  image_descriptors = new ImgDescMap;
   num_devices = _Offload_number_of_devices ();
 }
 
@@ -192,14 +207,13 @@ GOMP_OFFLOAD_get_num_devices (void)
 
 static void
 offload (const char *file, uint64_t line, int device, const char *name,
-int num_vars, VarDesc *vars, VarDesc2 *vars2, const void **async_data)
+int num_vars, VarDesc *vars, const void **async_data)
 {
   OFFLOAD ofld = __offload_target_acquire1 (&device, file, line);
   if (ofld)
 {
   if (async_data == NULL)
-   __offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL,
-   NULL);
+   __offload_offload1 (ofld, name, 0, num_vars, vars, NULL, 0, NULL, NULL);
   else
{
  OffloadFlags flags;
@@ -217,13 +231,27 @@ offload (const char *file, uint64_t line, int device, 
const char *name,
 }
 
 static void
+unregister_main_image ()
+{
+  __offload_unregister_image (&main_target_image);
+}
+
+static void
 register_main_image ()
 {
+  /* Do not check the return value, because old versions of liboffloadmic did
+ not have return values.  */
   __offload_register_image (&main_target_image);
 
   /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
  asynchronous task on target is completed.  */
   __offlo

Re: libgomp: Compile-time error for non-portable gomp_mutex_t initialization

2015-11-18 Thread Ilya Verbin
On Fri, Sep 25, 2015 at 17:28:25 +0200, Jakub Jelinek wrote:
> On Fri, Sep 25, 2015 at 05:04:47PM +0200, Thomas Schwinge wrote:
> > On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin  wrote:
> > > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > > of mutex guarding gomp_target_init (which is using pthread_once 
> > > > guaranteed
> > > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > > (if those are run from ctors, then I guess something like dl_load_lock
> > > > ensures at least on glibc that multiple GOMP_offload_register calls 
> > > > aren't
> > > > performed at the same time) in accessing/reallocating offload_images
> > > > and num_offload_images and the lack of support to register further
> > > > images after the gomp_target_init call (if you dlopen further shared
> > > > libraries) is really bad.  And it would be really nice to support the
> > > > unloading.
> > 
> > > Here is the latest patch for libgomp and mic plugin.
> > 
> > > libgomp/
> > 
> > >   * target.c (register_lock): New mutex for offload image registration.
> > 
> > >   (GOMP_offload_register): Add mutex lock.
> 
> That is definitely wrong.  You'd totally break --disable-linux-futex support
> on linux and bootstrap on e.g. Solaris and various other pthread targets.

I don't quite understand, do you mean that gcc 5 and trunk are broken, because
register_lock doesn't have initialization?  But it seems that bootstrap on
Solaris and other targets works fine...

> At least for ELF and dynamic linking, shared libraries that contain
> constructors that call GOMP_offload_register* should have DT_NEEDED libgomp
> and thus libgomp's constructors should be run before the constructors of
> the libraries that call GOMP_offload_register*.

So, libgomp should contain a constructor, which will call gomp_mutex_init
(®ister_lock) before any call to GOMP_offload_register*, right?

> For the targets without known zero initializer for gomp_mutex_lock, either
> there is an option to use pthread_once to make sure it is initialized once,
> or there is an option to define a macro like GOMP_MUTEX_INITIALIZER,
> defined to PTHREAD_MUTEX_INITIALIZER in config/posix/mutex.h and to
> { 0 } in config/linux/mutex.h and something like {} or whatever in
> config/rtems/mutex.h.  Then for the non-automatic non-heap
> gomp_mutex_t's you could just initialize them in their initializers
> with GOMP_MUTEX_INITIALIZER.

  -- Ilya


Re: [gomp4.1] Handle linear clause modifiers in declare simd

2015-11-18 Thread Ilya Verbin
Hi!

On Wed, Jul 01, 2015 at 12:55:38 +0200, Jakub Jelinek wrote:
> I've committed following patch, which per the new ABI additions
> mangles and handles the various new linear clause modifiers in
> declare simd functions.  The vectorizer side is not done yet,
>
> [...]
>
> @@ -14195,12 +14216,25 @@ simd_clone_mangle (struct cgraph_node *n
>  {
>struct cgraph_simd_clone_arg arg = clone_info->args[n];
>  
> -  if (arg.arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM)
> - pp_character (&pp, 'u');
> -  else if (arg.arg_type == SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP)
> +  switch (arg.arg_type)
>   {
> -   gcc_assert (arg.linear_step != 0);
> + case SIMD_CLONE_ARG_TYPE_UNIFORM:
> +   pp_character (&pp, 'u');
> +   break;
> + case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
> pp_character (&pp, 'l');
> +   goto mangle_linear;
> + case SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP:
> +   pp_character (&pp, 'R');
> +   goto mangle_linear;
> + case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
> +   pp_character (&pp, 'L');
> +   goto mangle_linear;
> + case SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_CONSTANT_STEP:
> +   pp_character (&pp, 'U');
> +   goto mangle_linear;
> + mangle_linear:
> +   gcc_assert (arg.linear_step != 0);

Could you please point to where the new ABI additions are documented?
I can't find R/L/U parameter types in [1] and [2].

[1] 
https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt
[2] https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4

Thanks,
  -- Ilya


Re: [PATCH] Implement GOMP_OFFLOAD_unload_image in intelmic plugin

2015-11-19 Thread Ilya Verbin
On Thu, Nov 19, 2015 at 14:33:06 +0100, Jakub Jelinek wrote:
> On Mon, Nov 16, 2015 at 08:33:28PM +0300, Ilya Verbin wrote:
> > diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
> > b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > index 772e198..6ee585e 100644
> > --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > @@ -65,6 +65,17 @@ typedef std::vector DevAddrVect;
> >  /* Addresses for all images and all devices.  */
> >  typedef std::map ImgDevAddrMap;
> >  
> > +/* Image descriptor needed by __offload_[un]register_image.  */
> > +struct TargetImageDesc {
> > +  int64_t size;
> > +  /* 10 characters is enough for max int value.  */
> > +  char name[sizeof ("lib00.so")];
> > +  char data[];
> > +} __attribute__ ((packed));
> 
> Why the packed attribute?  I know it is preexisting, but with int64_t
> being the first and then just char, there is no padding in between fields.

Hmmm, I can't remember, but I definitely have added this attribute 2 years ago,
because liboffloadmic failed to register the image.  Anyway, now everything
works fine without it.

> And to determine the size without data, you can just use offsetof.

I will add this:

diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 6ee585e..f8c1725 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -71,7 +71,7 @@ struct TargetImageDesc {
   /* 10 characters is enough for max int value.  */
   char name[sizeof ("lib00.so")];
   char data[];
-} __attribute__ ((packed));
+};
 
 /* Image descriptors, indexed by a pointer obtained from libgomp.  */
 typedef std::map ImgDescMap;
@@ -313,9 +313,8 @@ offload_image (const void *target_image)
 target_image, image_start, image_end);
 
   int64_t image_size = (uintptr_t) image_end - (uintptr_t) image_start;
-  TargetImageDesc *image
-= (TargetImageDesc *) malloc (sizeof (int64_t) + sizeof 
("lib00.so")
- + image_size);
+  TargetImageDesc *image = (TargetImageDesc *) malloc (offsetof 
(TargetImageDesc, data)
+  + image_size);
   if (!image)
 {
   fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);


> > @@ -217,13 +231,27 @@ offload (const char *file, uint64_t line, int device, 
> > const char *name,
> >  }
> >  
> >  static void
> > +unregister_main_image ()
> > +{
> > +  __offload_unregister_image (&main_target_image);
> > +}
> > +
> > +static void
> >  register_main_image ()
> >  {
> > +  /* Do not check the return value, because old versions of liboffloadmic 
> > did
> > + not have return values.  */
> >__offload_register_image (&main_target_image);
> >  
> >/* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
> >   asynchronous task on target is completed.  */
> >__offload_register_task_callback (GOMP_PLUGIN_target_task_completion);
> > +
> > +  if (atexit (unregister_main_image) != 0)
> > +{
> > +  fprintf (stderr, "%s: atexit failed\n", __FILE__);
> > +  exit (1);
> > +}
> >  }
> 
> What is the point of this hunk?  Is there any point in unregistering the
> main target image?  I mean at that point the process is exiting anyway.
> The importance of unregistering target images registered from shared
> libraries is that they should be unregistered when they are dlclosed.

liboffloadmic performs correct finalization of the target process in
__offload_fini_library, which is called only during unregistration of the main
target image.
Without this finalization the target process will be destroyed after unloading
libcoi_host.so.  And then some DSO may call GOMP_offload_unregister_ver from its
destructor, which will try to unload target image from the already destroyed
process.  This issue is reproducible only using real COI.

  -- Ilya


Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2015-11-20 Thread Ilya Verbin
On Wed, Dec 10, 2014 at 01:48:21 +0300, Ilya Verbin wrote:
> On 09 Dec 14:59, Richard Biener wrote:
> > On Mon, 8 Dec 2014, Ilya Verbin wrote:
> > > Unfortunately, this fix was not general enough.
> > > There might be cases when mixed object files get into lto-wrapper, ie 
> > > some of
> > > them contain only LTO sections, some contain only offload sections, and 
> > > some
> > > contain both.  But when lto-wrapper will pass all these files to 
> > > recompilation,
> > > the compiler might crash (it depends on the order of input files), since 
> > > in
> > > read_cgraph_and_symbols it expects that *all* input files contain IR 
> > > section of
> > > given type.
> > > This patch splits input objects from argv into lto_argv and offload_argv, 
> > > so
> > > that all files in arrays contain corresponding IR.
> > > Similarly, in lto-plugin, it was bad idea to add objects, which contain 
> > > offload
> > > IR without LTO, to claimed_files, since this may corrupt a resolution 
> > > file.
> > > 
> > > Tested on various combinations of files with/without -flto and 
> > > with/without
> > > offload, using trunk ld and gold, also tested on ld without plugin 
> > > support.
> > > Bootstrap and make check passed on x86_64-linux and i686-linux.  Ok for 
> > > trunk?
> > 
> > Did you check that bootstrap-lto still works?  Ok if so.
> 
> Yes, bootstrap-lto passed.
> Committed revision 218543.

I don't know how I missed this a year ago, but mixing of LTO objects with
offloading-without-LTO objects still doesn't work :(
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463 filed about that.
Any thoughts how to fix this?

Thanks,
  -- Ilya


Re: Enable pointer TBAA for LTO

2015-11-23 Thread Ilya Verbin
On Mon, Nov 23, 2015 at 16:31:42 +0100, Richard Biener wrote:
> I think it also causes the following and one related ICE
> 
> FAIL: gcc.dg/vect/pr62021.c -flto -ffat-lto-objects (internal compiler 
> error)
> 
> /space/rguenther/src/svn/trunk3/gcc/testsuite/gcc.dg/vect/pr62021.c:7:1: 
> internal compiler error: in get_alias_set, at alias.c:880^M
> 0x7528a7 get_alias_set(tree_node*)^M
> /space/rguenther/src/svn/trunk3/gcc/alias.c:880^M
> 0x751ce5 component_uses_parent_alias_set_from(tree_node const*)^M
> /space/rguenther/src/svn/trunk3/gcc/alias.c:635^M
> 0x7522ad reference_alias_ptr_type_1^M
> /space/rguenther/src/svn/trunk3/gcc/alias.c:747^M
> 0x752683 get_alias_set(tree_node*)^M
> ...

And an ICE in intelmicemul offloading compiler:

FAIL: libgomp.c++/for-11.C (internal compiler error)
FAIL: libgomp.c++/for-13.C (internal compiler error)
FAIL: libgomp.c++/for-14.C (internal compiler error)
FAIL: libgomp.c/for-3.c (internal compiler error)
FAIL: libgomp.c/for-5.c (internal compiler error)
FAIL: libgomp.c/for-6.c (internal compiler error)

libgomp/testsuite/libgomp.c/for-2.h:201:9: internal compiler error: in 
get_alias_set, at alias.c:880
0x710eef get_alias_set(tree_node*)
gcc/alias.c:880
0x71032d component_uses_parent_alias_set_from(tree_node const*)
gcc/alias.c:635
0x7108f5 reference_alias_ptr_type_1
gcc/alias.c:747
0x710ccb get_alias_set(tree_node*)
gcc/alias.c:843
0x89d208 expand_assignment(tree_node*, tree_node*, bool)
gcc/expr.c:5020
0x768ff7 expand_gimple_stmt_1
gcc/cfgexpand.c:3592
0x7693e2 expand_gimple_stmt
gcc/cfgexpand.c:3688
0x7704ed expand_gimple_basic_block
gcc/cfgexpand.c:5694
0x771ff1 execute
gcc/cfgexpand.c:6309
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

  -- Ilya


Re: [PATCH 12/12] always define ENABLE_OFFLOADING

2015-11-23 Thread Ilya Verbin
On Mon, Nov 09, 2015 at 19:41:21 +0100, Bernd Schmidt wrote:
> On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:
> >-#ifdef ENABLE_OFFLOADING
> >/* If the user didn't specify any, default to all configured offload
> >   targets.  */
> >if (offload_targets == NULL)
> >  handle_foffload_option (OFFLOAD_TARGETS);
> >-#endif
> 
> This one I would keep guarded with an if.
> 
> Otherwise ok modulo stage 1 end.

There are 2 new uses of "#ifdef ENABLE_OFFLOADING" in c_parser_oacc_declare and
cp_parser_oacc_declare.
I don't know how to properly test OpenACC, so here is untested patch.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7b10764..1dc0bd5 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -13473,14 +13473,15 @@ c_parser_oacc_declare (c_parser *parser)
  if (node != NULL)
{
  node->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
- g->have_offload = true;
- if (is_a  (node))
+ if (ENABLE_OFFLOADING)
{
- vec_safe_push (offload_vars, decl);
- node->force_output = 1;
+ g->have_offload = true;
+ if (is_a  (node))
+   {
+ vec_safe_push (offload_vars, decl);
+ node->force_output = 1;
+   }
}
-#endif
}
}
}
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 24ed404..a9c0a45 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -34633,14 +34633,15 @@ cp_parser_oacc_declare (cp_parser *parser, cp_token 
*pragma_tok)
  if (node != NULL)
{
  node->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
- g->have_offload = true;
- if (is_a  (node))
+ if (ENABLE_OFFLOADING)
{
- vec_safe_push (offload_vars, decl);
- node->force_output = 1;
+ g->have_offload = true;
+ if (is_a  (node))
+   {
+ vec_safe_push (offload_vars, decl);
+ node->force_output = 1;
+   }
}
-#endif
}
}
}

  -- Ilya


  1   2   3   4   >