ABataev added inline comments.
================ Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:8739 +/// // For each component specified by this mapper: +/// if (currentComponent.hasMapper()) +/// (*currentComponent.Mapper())(rt_mapper_handle, arg_base, arg_begin, ---------------- lildmh wrote: > ABataev wrote: > > lildmh wrote: > > > ABataev wrote: > > > > lildmh wrote: > > > > > ABataev wrote: > > > > > > lildmh wrote: > > > > > > > ABataev wrote: > > > > > > > > lildmh wrote: > > > > > > > > > ABataev wrote: > > > > > > > > > > Currently `currentComponent` is generated by the compiler. > > > > > > > > > > But can we instead pass this data as an extra parameter to > > > > > > > > > > this `omp_mapper` function. > > > > > > > > > Emm, I think this scheme will be very difficult and > > > > > > > > > inefficient. If we pass components as an argument of > > > > > > > > > `omp_mapper` function, it means that the runtime needs to > > > > > > > > > generate all components related to a map clause. I don't > > > > > > > > > think the runtime is able to do that efficiently. On the > > > > > > > > > other hand, in the current scheme, these components are > > > > > > > > > naturally generated by the compiler, and the runtime only > > > > > > > > > needs to know the base pointer, pointer, type, size. etc. > > > > > > > > With the current scheme, we may end with the code blowout. We > > > > > > > > need to generate very similar code for different types and > > > > > > > > variables. The worst thing here is that we will be unable to > > > > > > > > optimize this huge amount of code because the codegen relies on > > > > > > > > the runtime functions and the code cannot be inlined. That's > > > > > > > > why I would like to move as much as possible code to the > > > > > > > > runtime rather than to emit it in the compiler. > > > > > > > I understand your concerns. I think this is the best we can do > > > > > > > right now. > > > > > > > > > > > > > > The most worrisome case will be when we have nested mappers > > > > > > > within each other. In this case, a mapper function will call > > > > > > > another mapper function. We can inline the inner mapper functions > > > > > > > in this scenario, so that these mapper function can be properly > > > > > > > optimized. As a result, I think the performance should be fine. > > > > > > Instead, we can use indirect function calls passed in the array to > > > > > > the runtime. Do you think it is going to be slower? In your current > > > > > > scheme, we generate many runtime calls instead. Could you try to > > > > > > estimate the number of calls in cases if we'll call the mappers > > > > > > through the indirect function calls and in your cuurent scheme, > > > > > > where we need to call the runtime functions many times in each > > > > > > particular mapper? > > > > > Hi Alexey, > > > > > > > > > > Sorry I don't understand your idea. What indirect function calls do > > > > > you propose to be passed to the runtime? What are these functions > > > > > supposed to do? > > > > > > > > > > The number of function calls will be exactly equal to the number of > > > > > components mapped, no matter whether there are nested mappers or not. > > > > > The number of components depend on the program. E.g., if we map a > > > > > large array section, then there will be many more function calls. > > > > I mean the pointers to the mapper function, generated by the compiler. > > > > In your comment, it is `c.Mapper()` > > > If we pass nested mapper functions to the runtime, I think it will slow > > > down execution because of the extra level of indirect function calls. > > > E.g., the runtime will call `omp_mapper1`, which calls the runtime back, > > > which calls `omp_mapper2`, .... This can result in a deep call stack. > > > > > > I think the current implementation will be more efficient, which doesn't > > > pass nested mappers to the runtime. One call to the outer most mapper > > > function will have all data mapping done. The call stack will be 2 level > > > deep (the first level is the mapper function, and the second level is > > > `__tgt_push_mapper_component`) in this case from the runtime. There are > > > also more compiler optimization space when we inline all nested mapper > > > functions. > > Yes, if we leave it as is. But if instead of the bunch unique functions > > we'll have the common one, that accept list if indirect pointers to > > functions additionally, and move it to the runtime library, we won't need > > those 2 functions we have currently. We'll have full access to the mapping > > data vector in the runtime library and won't need to use those 2 accessors > > we have currently. Instead, we'll need just one runtime functions, which > > implements the whole mapping logic. We still need to call it recursively, > > but I assume the number of calls will remain the same as in the current > > scheme. Did you understand the idea? If yes, it would good if you coild try > > to estimate the number of function calls in current scheme and in this new > > scheme to estimate possible pros and cons. > Hi Alexey, > > Could you give an example for this scheme? 1) I don't understand how the > mapper function can have full access to the mapping data vector without > providing these 2 accessors. 2) I don't think it is possible to have a common > function instead of bunch of unique functions for each mapper declared. Hi Lingda, something like this. ``` void __tgt_mapper(void *base, void *begin, size_t size, int64_t type, auto components[]) { // Allocate space for an array section first. if (size > 1 && !maptype.IsDelete) <push>(base, begin, size*sizeof(Ty), clearToFrom(type)); // Map members. for (unsigned i = 0; i < size; i++) { // For each component specified by this mapper: for (auto c : components) { if (c.hasMapper()) (*c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type); else <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type); } } // Delete the array section. if (size > 1 && maptype.IsDelete) <push>(base, begin, size*sizeof(Ty), clearToFrom(type)); } void <type>.mapper(void *base, void *begin, size_t size, int64_t type) { auto sub_components[] = {...}; __tgt_mapper(base, begin, size, type, sub_components); } ``` ================ Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:747 + // Call to void __tgt_push_mapper_component(void *rt_mapper_handle, void + // *base, void *begin, size_t size, int64_t type); + OMPRTL__tgt_push_mapper_component, ---------------- Check the declaration of the runtime function in the runtime patch and here. `size` parameter here has type `size_t`, while in runtime part it is `int64_t` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D59474/new/ https://reviews.llvm.org/D59474 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits