On Wed, Sep 18, 2013 at 12:35:38PM +0400, Michael V. Zolotukhin wrote: > I merged my patch with recent changes in gomp4-branch, and the new version is > below. Also, I fixed most of your remarks - the one isn't fixed is checking > sizeof(void*)==sizeof(uintptr_t) in configure. I'll do it in the next patch. > > Is it ok for gomp4-branch? > > Also, I was thinking of how we could test such changes. Maybe we need to add > some logging stuff to all these libgomp routines - then we could check logs > and > thus test if everything works as expected. Otherwise it's really hard to find > out, whether offloading happened or not. What do you think?
The OpenMP standard has the omp_is_initial_device () function that can be used to query whether the code is offloaded or not. So I don't think we need to do the logging. For the device 257 hack we of course don't return that as true, but that is a hack that is going away. > @@ -50,6 +59,10 @@ struct target_mem_desc { > struct target_mem_desc *prev; > /* Number of items in following list. */ > size_t list_count; > + > + /* Corresponding target device descriptor. */ > + struct gomp_device_descr* device_descr; Please put the space before *, not after it. > + /* Plugin file name. */ > + char plugin_name[PATH_MAX]; I don't like such fixed size arrays, for most cases it will be big memory waste. What do you need the plugin_name for? And, if you really need it past dlopen, can't you store it as const char *plugin_name instead? > + > + /* Plugin file handler. */ > + void *plugin_handle; > + > + /* Function handlers. */ > + bool (*device_available_func) (void); The scan hook shouldn't give you just bool whether the device is available, but how many devices of that kind are available. You can have 2 MIC cards and one or two HSAIL GPGPU in a box e.g. Plus, is this hook useful after the initialization at all? I'd say it would be enough to just dlsym it during initialization, ask how many devices it has and just create that many device structures with that plugin_handle. What you want are hooks for device_alloc (taking size and align arguments, returning uintptr_t target address), device_free (taking uintptr_t target address and perhaps size), device_copyto (like memcpy, just with target address uintptr_t instead of void *) and device_copyfrom (similarly), and device_run hook or similar (taking host and target fn and target uintptr_t address of the block with pointers). > attribute_hidden int > gomp_get_num_devices (void) > { > - /* FIXME: Scan supported accelerators when called the first time. */ > - return 0; You need to call pthread_once here too, so that omp_get_num_devices returns the correct number. > + return num_devices; > } > > -static int > -resolve_device (int device) > +static struct gomp_device_descr* > +resolve_device (int device_id) > { > - if (device == -1) > + (void) pthread_once (&gomp_is_initialized, gomp_target_init); Thus, IMHO you should just call gomp_get_num_devices () here, or after the if (device_id == -1) block, and that will ensure gomp_target_init has been already called. Just save the return value into a temporary. > + if (device_id == -1) > { > struct gomp_task_icv *icv = gomp_icv (false); > - device = icv->default_device_var; > + device_id = icv->default_device_var; > } > /* FIXME: Temporary hack for testing non-shared address spaces on host. */ > - if (device == 257) > - return 257; > - if (device >= gomp_get_num_devices ()) > - return -1; > - return -1; > + if (device_id == 257) > + return &devices[0]; Guess the hack should be if gomp_get_num_devices () returned 0 and device_id == 257, otherwise the hack device won't be created. > @@ -137,15 +179,20 @@ gomp_map_vars_existing (splay_tree_key oldn, > splay_tree_key newn, > } > > static struct target_mem_desc * > -gomp_map_vars (size_t mapnum, void **hostaddrs, size_t *sizes, > - unsigned char *kinds, bool is_target) > +gomp_map_vars (struct gomp_device_descr* devicep, size_t mapnum, Again, please watch the formatting. > - struct target_mem_desc *tgt > - = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum); > + struct target_mem_desc *tgt = NULL; > + tgt = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum); Why this change? > tgt->list_count = mapnum; > tgt->refcount = 1; > + tgt->device_descr = devicep; > + > + if (!devicep) > + return tgt; Why this conditional? mapnum == 0 conditional below will do the trick. > if (mapnum == 0) > return tgt; > @@ -322,6 +373,8 @@ gomp_unmap_tgt (struct target_mem_desc *tgt) > static void > gomp_unmap_vars (struct target_mem_desc *tgt) > { > + struct gomp_device_descr* devicep = tgt->device_descr; > + Formatting (several other places too). > + /* FIXME: currently only device 257 is available and it is a hack which is > + done only to test the functionality early. We need to enable all > devices, > + not only this one. */ Yeah, I don't see why the FIXME is here, just use gomp_map_vars unconditionally, or conditionally on some flag in the device descr structure (whether device has non-shared address space). > + if (devicep->id == 257) > { > struct target_mem_desc *tgt > - = gomp_map_vars (mapnum, hostaddrs, sizes, kinds, true); > + = gomp_map_vars (devicep, mapnum, hostaddrs, sizes, kinds, true); > fn ((void *) tgt->tgt_start); And thus would be devicep->device_run hook. > gomp_unmap_vars (tgt); > } > @@ -437,8 +497,8 @@ void > GOMP_target_data (int device, size_t mapnum, void **hostaddrs, size_t *sizes, > unsigned char *kinds) > { > - device = resolve_device (device); > - if (device == -1) > + struct gomp_device_descr* devicep = resolve_device (device); > + if (devicep == NULL) > { > /* Host fallback. */ > struct gomp_task_icv *icv = gomp_icv (false); > @@ -449,17 +509,17 @@ GOMP_target_data (int device, size_t mapnum, void > **hostaddrs, size_t *sizes, > new #pragma omp target data, otherwise GOMP_target_end_data > would get out of sync. */ > struct target_mem_desc *tgt > - = gomp_map_vars (0, NULL, NULL, NULL, false); > + = gomp_map_vars (devicep, 0, NULL, NULL, NULL, false); Why devicep here, when you know it is NULL? > - if (device == 257) > + if (devicep->id == 257) Again. > GOMP_target_update (int device, size_t mapnum, void **hostaddrs, size_t > *sizes, > unsigned char *kinds) > { > - device = resolve_device (device); > - if (device == -1) > + struct gomp_device_descr* devicep = resolve_device (device); > + if (devicep == NULL) > return; > > - if (device == 257) > - gomp_update (mapnum, hostaddrs, sizes, kinds); > + if (devicep->id == 257) Likewise. > + strncpy (current_device.plugin_name, plugin_path, PATH_MAX); > + strcat (current_device.plugin_name, "/"); > + strcat (current_device.plugin_name, ent->d_name); Potential buffer overflow. > +/* This function initializes runtime needed for offloading. > + It loads plugins, sets up a connection with devices, etc. */ > +static void > +gomp_target_init (void) > +{ > + gomp_find_available_plugins (); > +} Why this indirection? Just rename gomp_find_available_plugins to gomp_target_init? Jakub