lildmh marked an inline comment as done.
lildmh added inline comments.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:8739
+///     // For each component specified by this mapper:
+///     if (currentComponent.hasMapper())
+///       (*currentComponent.Mapper())(rt_mapper_handle, arg_base, arg_begin,
----------------
ABataev wrote:
> lildmh wrote:
> > ABataev wrote:
> > > lildmh wrote:
> > > > ABataev wrote:
> > > > > lildmh wrote:
> > > > > > ABataev wrote:
> > > > > > > lildmh wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > Currently `currentComponent` is generated by the compiler. 
> > > > > > > > > But can we instead pass this data as an extra parameter to 
> > > > > > > > > this `omp_mapper` function.
> > > > > > > > Emm, I think this scheme will be very difficult and 
> > > > > > > > inefficient. If we pass components as an argument of 
> > > > > > > > `omp_mapper` function, it means that the runtime needs to 
> > > > > > > > generate all components related to a map clause. I don't think 
> > > > > > > > the runtime is able to do that efficiently. On the other hand, 
> > > > > > > > in the current scheme, these components are naturally generated 
> > > > > > > > by the compiler, and the runtime only needs to know the base 
> > > > > > > > pointer, pointer, type, size. etc.
> > > > > > > With the current scheme, we may end with the code blowout. We 
> > > > > > > need to generate very similar code for different types and 
> > > > > > > variables. The worst thing here is that we will be unable to 
> > > > > > > optimize this huge amount of code because the codegen relies on 
> > > > > > > the runtime functions and the code cannot be inlined. That's why 
> > > > > > > I would like to move as much as possible code to the runtime 
> > > > > > > rather than to emit it in the compiler. 
> > > > > > I understand your concerns. I think this is the best we can do 
> > > > > > right now.
> > > > > > 
> > > > > > The most worrisome case will be when we have nested mappers within 
> > > > > > each other. In this case, a mapper function will call another 
> > > > > > mapper function. We can inline the inner mapper functions in this 
> > > > > > scenario, so that these mapper function can be properly optimized. 
> > > > > > As a result, I think the performance should be fine.
> > > > > Instead, we can use indirect function calls passed in the array to 
> > > > > the runtime. Do you think it is going to be slower? In your current 
> > > > > scheme, we generate many runtime calls instead. Could you try to 
> > > > > estimate the number of calls in cases if we'll call the mappers 
> > > > > through the indirect function calls and in your cuurent scheme, where 
> > > > > we need to call the runtime functions many times in each particular 
> > > > > mapper?
> > > > Hi Alexey,
> > > > 
> > > > Sorry I don't understand your idea. What indirect function calls do you 
> > > > propose to be passed to the runtime? What are these functions supposed 
> > > > to do?
> > > > 
> > > > The number of function calls will be exactly equal to the number of 
> > > > components mapped, no matter whether there are nested mappers or not. 
> > > > The number of components depend on the program. E.g., if we map a large 
> > > > array section, then there will be many more function calls.
> > > I mean the pointers to the mapper function, generated by the compiler. In 
> > > your comment, it is `c.Mapper()`
> > If we pass nested mapper functions to the runtime, I think it will slow 
> > down execution because of the extra level of indirect function calls. E.g., 
> > the runtime will call `omp_mapper1`, which calls the runtime back, which 
> > calls `omp_mapper2`, .... This can result in a deep call stack.
> > 
> > I think the current implementation will be more efficient, which doesn't 
> > pass nested mappers to the runtime. One call to the outer most mapper 
> > function will have all data mapping done. The call stack will be 2 level 
> > deep (the first level is the mapper function, and the second level is 
> > `__tgt_push_mapper_component`) in this case from the runtime. There are 
> > also more compiler optimization space when we inline all nested mapper 
> > functions.
> Yes, if we leave it as is. But if instead of the bunch unique functions we'll 
> have the common one, that accept list if indirect pointers to functions 
> additionally, and move it to the runtime library, we won't need those 2 
> functions we have currently. We'll have full access to the mapping data 
> vector in the runtime library and won't need to use those 2 accessors we have 
> currently. Instead, we'll need just one runtime functions, which implements 
> the whole mapping logic. We still need to call it recursively, but I assume 
> the number of calls will remain the same as in the current scheme. Did you 
> understand the idea? If yes, it would good if you coild try to estimate the 
> number of function calls in current scheme and in this new scheme to estimate 
> possible pros and cons.
Hi Alexey,

Could you give an example for this scheme? 1) I don't understand how the mapper 
function can have full access to the mapping data vector without providing 
these 2 accessors. 2) I don't think it is possible to have a common function 
instead of bunch of unique functions for each mapper declared.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59474/new/

https://reviews.llvm.org/D59474



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to