lildmh marked an inline comment as done. lildmh added inline comments.
================ Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:8739 +/// // For each component specified by this mapper: +/// if (currentComponent.hasMapper()) +/// (*currentComponent.Mapper())(rt_mapper_handle, arg_base, arg_begin, ---------------- ABataev wrote: > lildmh wrote: > > ABataev wrote: > > > lildmh wrote: > > > > ABataev wrote: > > > > > lildmh wrote: > > > > > > ABataev wrote: > > > > > > > lildmh wrote: > > > > > > > > ABataev wrote: > > > > > > > > > Currently `currentComponent` is generated by the compiler. > > > > > > > > > But can we instead pass this data as an extra parameter to > > > > > > > > > this `omp_mapper` function. > > > > > > > > Emm, I think this scheme will be very difficult and > > > > > > > > inefficient. If we pass components as an argument of > > > > > > > > `omp_mapper` function, it means that the runtime needs to > > > > > > > > generate all components related to a map clause. I don't think > > > > > > > > the runtime is able to do that efficiently. On the other hand, > > > > > > > > in the current scheme, these components are naturally generated > > > > > > > > by the compiler, and the runtime only needs to know the base > > > > > > > > pointer, pointer, type, size. etc. > > > > > > > With the current scheme, we may end with the code blowout. We > > > > > > > need to generate very similar code for different types and > > > > > > > variables. The worst thing here is that we will be unable to > > > > > > > optimize this huge amount of code because the codegen relies on > > > > > > > the runtime functions and the code cannot be inlined. That's why > > > > > > > I would like to move as much as possible code to the runtime > > > > > > > rather than to emit it in the compiler. > > > > > > I understand your concerns. I think this is the best we can do > > > > > > right now. > > > > > > > > > > > > The most worrisome case will be when we have nested mappers within > > > > > > each other. In this case, a mapper function will call another > > > > > > mapper function. We can inline the inner mapper functions in this > > > > > > scenario, so that these mapper function can be properly optimized. > > > > > > As a result, I think the performance should be fine. > > > > > Instead, we can use indirect function calls passed in the array to > > > > > the runtime. Do you think it is going to be slower? In your current > > > > > scheme, we generate many runtime calls instead. Could you try to > > > > > estimate the number of calls in cases if we'll call the mappers > > > > > through the indirect function calls and in your cuurent scheme, where > > > > > we need to call the runtime functions many times in each particular > > > > > mapper? > > > > Hi Alexey, > > > > > > > > Sorry I don't understand your idea. What indirect function calls do you > > > > propose to be passed to the runtime? What are these functions supposed > > > > to do? > > > > > > > > The number of function calls will be exactly equal to the number of > > > > components mapped, no matter whether there are nested mappers or not. > > > > The number of components depend on the program. E.g., if we map a large > > > > array section, then there will be many more function calls. > > > I mean the pointers to the mapper function, generated by the compiler. In > > > your comment, it is `c.Mapper()` > > If we pass nested mapper functions to the runtime, I think it will slow > > down execution because of the extra level of indirect function calls. E.g., > > the runtime will call `omp_mapper1`, which calls the runtime back, which > > calls `omp_mapper2`, .... This can result in a deep call stack. > > > > I think the current implementation will be more efficient, which doesn't > > pass nested mappers to the runtime. One call to the outer most mapper > > function will have all data mapping done. The call stack will be 2 level > > deep (the first level is the mapper function, and the second level is > > `__tgt_push_mapper_component`) in this case from the runtime. There are > > also more compiler optimization space when we inline all nested mapper > > functions. > Yes, if we leave it as is. But if instead of the bunch unique functions we'll > have the common one, that accept list if indirect pointers to functions > additionally, and move it to the runtime library, we won't need those 2 > functions we have currently. We'll have full access to the mapping data > vector in the runtime library and won't need to use those 2 accessors we have > currently. Instead, we'll need just one runtime functions, which implements > the whole mapping logic. We still need to call it recursively, but I assume > the number of calls will remain the same as in the current scheme. Did you > understand the idea? If yes, it would good if you coild try to estimate the > number of function calls in current scheme and in this new scheme to estimate > possible pros and cons. Hi Alexey, Could you give an example for this scheme? 1) I don't understand how the mapper function can have full access to the mapping data vector without providing these 2 accessors. 2) I don't think it is possible to have a common function instead of bunch of unique functions for each mapper declared. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D59474/new/ https://reviews.llvm.org/D59474 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits