ABataev added inline comments.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:8739
+///     // For each component specified by this mapper:
+///     if (currentComponent.hasMapper())
+///       (*currentComponent.Mapper())(rt_mapper_handle, arg_base, arg_begin,
----------------
lildmh wrote:
> ABataev wrote:
> > lildmh wrote:
> > > ABataev wrote:
> > > > lildmh wrote:
> > > > > ABataev wrote:
> > > > > > lildmh wrote:
> > > > > > > ABataev wrote:
> > > > > > > > Currently `currentComponent` is generated by the compiler. But 
> > > > > > > > can we instead pass this data as an extra parameter to this 
> > > > > > > > `omp_mapper` function.
> > > > > > > Emm, I think this scheme will be very difficult and inefficient. 
> > > > > > > If we pass components as an argument of `omp_mapper` function, it 
> > > > > > > means that the runtime needs to generate all components related 
> > > > > > > to a map clause. I don't think the runtime is able to do that 
> > > > > > > efficiently. On the other hand, in the current scheme, these 
> > > > > > > components are naturally generated by the compiler, and the 
> > > > > > > runtime only needs to know the base pointer, pointer, type, size. 
> > > > > > > etc.
> > > > > > With the current scheme, we may end with the code blowout. We need 
> > > > > > to generate very similar code for different types and variables. 
> > > > > > The worst thing here is that we will be unable to optimize this 
> > > > > > huge amount of code because the codegen relies on the runtime 
> > > > > > functions and the code cannot be inlined. That's why I would like 
> > > > > > to move as much as possible code to the runtime rather than to emit 
> > > > > > it in the compiler. 
> > > > > I understand your concerns. I think this is the best we can do right 
> > > > > now.
> > > > > 
> > > > > The most worrisome case will be when we have nested mappers within 
> > > > > each other. In this case, a mapper function will call another mapper 
> > > > > function. We can inline the inner mapper functions in this scenario, 
> > > > > so that these mapper function can be properly optimized. As a result, 
> > > > > I think the performance should be fine.
> > > > Instead, we can use indirect function calls passed in the array to the 
> > > > runtime. Do you think it is going to be slower? In your current scheme, 
> > > > we generate many runtime calls instead. Could you try to estimate the 
> > > > number of calls in cases if we'll call the mappers through the indirect 
> > > > function calls and in your cuurent scheme, where we need to call the 
> > > > runtime functions many times in each particular mapper?
> > > Hi Alexey,
> > > 
> > > Sorry I don't understand your idea. What indirect function calls do you 
> > > propose to be passed to the runtime? What are these functions supposed to 
> > > do?
> > > 
> > > The number of function calls will be exactly equal to the number of 
> > > components mapped, no matter whether there are nested mappers or not. The 
> > > number of components depend on the program. E.g., if we map a large array 
> > > section, then there will be many more function calls.
> > I mean the pointers to the mapper function, generated by the compiler. In 
> > your comment, it is `c.Mapper()`
> If we pass nested mapper functions to the runtime, I think it will slow down 
> execution because of the extra level of indirect function calls. E.g., the 
> runtime will call `omp_mapper1`, which calls the runtime back, which calls 
> `omp_mapper2`, .... This can result in a deep call stack.
> 
> I think the current implementation will be more efficient, which doesn't pass 
> nested mappers to the runtime. One call to the outer most mapper function 
> will have all data mapping done. The call stack will be 2 level deep (the 
> first level is the mapper function, and the second level is 
> `__tgt_push_mapper_component`) in this case from the runtime. There are also 
> more compiler optimization space when we inline all nested mapper functions.
Yes, if we leave it as is. But if instead of the bunch unique functions we'll 
have the common one, that accept list if indirect pointers to functions 
additionally, and move it to the runtime library, we won't need those 2 
functions we have currently. We'll have full access to the mapping data vector 
in the runtime library and won't need to use those 2 accessors we have 
currently. Instead, we'll need just one runtime functions, which implements the 
whole mapping logic. We still need to call it recursively, but I assume the 
number of calls will remain the same as in the current scheme. Did you 
understand the idea? If yes, it would good if you coild try to estimate the 
number of function calls in current scheme and in this new scheme to estimate 
possible pros and cons.


================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:744
   OMPRTL__tgt_target_data_update_nowait,
+  // Call to int64_t __tgt_mapper_num_components(void *rt_mapper_handle);
+  OMPRTL__tgt_mapper_num_components,
----------------
lildmh wrote:
> ABataev wrote:
> > Do we really need to use `int64_t` for number of elements? `size_t` must be 
> > enough.
> Because we use the return value to shift the memberof filed of map type, 
> which is `int64_t`, so I think `int64_t` makes sense here.
Ok, there is a discrepancy between runtime part and compiler part: 
`__tgt_push_mapper_component` uses `size_t` for size, while the runtime 
function uses `int64_t`. It won't work for 32bit platform.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59474/new/

https://reviews.llvm.org/D59474



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to