ABataev added inline comments.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:8739
+///     // For each component specified by this mapper:
+///     if (currentComponent.hasMapper())
+///       (*currentComponent.Mapper())(rt_mapper_handle, arg_base, arg_begin,
----------------
lildmh wrote:
> ABataev wrote:
> > lildmh wrote:
> > > ABataev wrote:
> > > > lildmh wrote:
> > > > > ABataev wrote:
> > > > > > lildmh wrote:
> > > > > > > ABataev wrote:
> > > > > > > > lildmh wrote:
> > > > > > > > > ABataev wrote:
> > > > > > > > > > Currently `currentComponent` is generated by the compiler. 
> > > > > > > > > > But can we instead pass this data as an extra parameter to 
> > > > > > > > > > this `omp_mapper` function.
> > > > > > > > > Emm, I think this scheme will be very difficult and 
> > > > > > > > > inefficient. If we pass components as an argument of 
> > > > > > > > > `omp_mapper` function, it means that the runtime needs to 
> > > > > > > > > generate all components related to a map clause. I don't 
> > > > > > > > > think the runtime is able to do that efficiently. On the 
> > > > > > > > > other hand, in the current scheme, these components are 
> > > > > > > > > naturally generated by the compiler, and the runtime only 
> > > > > > > > > needs to know the base pointer, pointer, type, size. etc.
> > > > > > > > With the current scheme, we may end with the code blowout. We 
> > > > > > > > need to generate very similar code for different types and 
> > > > > > > > variables. The worst thing here is that we will be unable to 
> > > > > > > > optimize this huge amount of code because the codegen relies on 
> > > > > > > > the runtime functions and the code cannot be inlined. That's 
> > > > > > > > why I would like to move as much as possible code to the 
> > > > > > > > runtime rather than to emit it in the compiler. 
> > > > > > > I understand your concerns. I think this is the best we can do 
> > > > > > > right now.
> > > > > > > 
> > > > > > > The most worrisome case will be when we have nested mappers 
> > > > > > > within each other. In this case, a mapper function will call 
> > > > > > > another mapper function. We can inline the inner mapper functions 
> > > > > > > in this scenario, so that these mapper function can be properly 
> > > > > > > optimized. As a result, I think the performance should be fine.
> > > > > > Instead, we can use indirect function calls passed in the array to 
> > > > > > the runtime. Do you think it is going to be slower? In your current 
> > > > > > scheme, we generate many runtime calls instead. Could you try to 
> > > > > > estimate the number of calls in cases if we'll call the mappers 
> > > > > > through the indirect function calls and in your cuurent scheme, 
> > > > > > where we need to call the runtime functions many times in each 
> > > > > > particular mapper?
> > > > > Hi Alexey,
> > > > > 
> > > > > Sorry I don't understand your idea. What indirect function calls do 
> > > > > you propose to be passed to the runtime? What are these functions 
> > > > > supposed to do?
> > > > > 
> > > > > The number of function calls will be exactly equal to the number of 
> > > > > components mapped, no matter whether there are nested mappers or not. 
> > > > > The number of components depend on the program. E.g., if we map a 
> > > > > large array section, then there will be many more function calls.
> > > > I mean the pointers to the mapper function, generated by the compiler. 
> > > > In your comment, it is `c.Mapper()`
> > > If we pass nested mapper functions to the runtime, I think it will slow 
> > > down execution because of the extra level of indirect function calls. 
> > > E.g., the runtime will call `omp_mapper1`, which calls the runtime back, 
> > > which calls `omp_mapper2`, .... This can result in a deep call stack.
> > > 
> > > I think the current implementation will be more efficient, which doesn't 
> > > pass nested mappers to the runtime. One call to the outer most mapper 
> > > function will have all data mapping done. The call stack will be 2 level 
> > > deep (the first level is the mapper function, and the second level is 
> > > `__tgt_push_mapper_component`) in this case from the runtime. There are 
> > > also more compiler optimization space when we inline all nested mapper 
> > > functions.
> > Yes, if we leave it as is. But if instead of the bunch unique functions 
> > we'll have the common one, that accept list if indirect pointers to 
> > functions additionally, and move it to the runtime library, we won't need 
> > those 2 functions we have currently. We'll have full access to the mapping 
> > data vector in the runtime library and won't need to use those 2 accessors 
> > we have currently. Instead, we'll need just one runtime functions, which 
> > implements the whole mapping logic. We still need to call it recursively, 
> > but I assume the number of calls will remain the same as in the current 
> > scheme. Did you understand the idea? If yes, it would good if you coild try 
> > to estimate the number of function calls in current scheme and in this new 
> > scheme to estimate possible pros and cons.
> Hi Alexey,
> 
> Could you give an example for this scheme? 1) I don't understand how the 
> mapper function can have full access to the mapping data vector without 
> providing these 2 accessors. 2) I don't think it is possible to have a common 
> function instead of bunch of unique functions for each mapper declared.
Hi Lingda, something like this.
```
void __tgt_mapper(void *base, void *begin, size_t size, int64_t type, auto 
components[]) {
  // Allocate space for an array section first.
  if (size > 1 && !maptype.IsDelete)
     <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
   
  // Map members.
  for (unsigned i = 0; i < size; i++) {
     // For each component specified by this mapper:
     for (auto c : components) {
       if (c.hasMapper())
         (*c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
       else
         <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
     }
  }
  // Delete the array section.
  if (size > 1 && maptype.IsDelete)
    <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
}

void <type>.mapper(void *base, void *begin, size_t size, int64_t type) {
 auto sub_components[] = {...};
 __tgt_mapper(base, begin, size, type, sub_components);
}
```



================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:747
+  // Call to void __tgt_push_mapper_component(void *rt_mapper_handle, void
+  // *base, void *begin, size_t size, int64_t type);
+  OMPRTL__tgt_push_mapper_component,
----------------
Check the declaration of the runtime function in the runtime patch and here. 
`size` parameter here has type `size_t`, while in runtime part it is `int64_t`


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59474/new/

https://reviews.llvm.org/D59474



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to