ABataev added a comment.

In D102107#2867670 <https://reviews.llvm.org/D102107#2867670>, @josemonsalve2 
wrote:

> In D102107#2867417 <https://reviews.llvm.org/D102107#2867417>, @ABataev wrote:
>
>> In D102107#2867382 <https://reviews.llvm.org/D102107#2867382>, @jdoerfert 
>> wrote:
>>
>>> In D102107#2832740 <https://reviews.llvm.org/D102107#2832740>, @ABataev 
>>> wrote:
>>>
>>>> In D102107#2832286 <https://reviews.llvm.org/D102107#2832286>, @jdoerfert 
>>>> wrote:
>>>>
>>>>> In D102107#2824581 <https://reviews.llvm.org/D102107#2824581>, @ABataev 
>>>>> wrote:
>>>>>
>>>>>> In D102107#2823706 <https://reviews.llvm.org/D102107#2823706>, 
>>>>>> @jdoerfert wrote:
>>>>>>
>>>>>>> In D102107#2821976 <https://reviews.llvm.org/D102107#2821976>, @ABataev 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> We used this kind of codegen initially but later found out that it 
>>>>>>>> causes a large overhead when gathering pointers into a record. What 
>>>>>>>> about hybrid scheme where the first args are passed as arguments and 
>>>>>>>> others (if any) are gathered into a record?
>>>>>>>
>>>>>>> I'm confused, maybe I misunderstand the problem. The parallel function 
>>>>>>> arguments need to go from the main thread to the workers somehow, I 
>>>>>>> don't see how this is done w/o a record. This patch makes it explicit 
>>>>>>> though.
>>>>>>
>>>>>> Pass it in a record for workers only? And use a hybrid scheme for all 
>>>>>> other parallel regions.
>>>>>
>>>>> I still do not follow. What does it mean for workers only? What is a 
>>>>> hybrid scheme? And, probably most importantly, how would we not 
>>>>> eventually put everything into a record anyway?
>>>>
>>>> On the host you don’t need to put everything into a record, especially for 
>>>> small parallel regions. Pass some first args in registers and only the 
>>>> remaining args gather into the record. For workers just pass all args in 
>>>> the record.
>>>
>>> Could you please respond to my question so we make progress here. We 
>>> *always* have to pass things in a record, do you agree?
>>
>> On the GPU device, yes. And I'm absolutely fine with packing args for the 
>> GPU device. But the patch packs the args not only for the GPU devices but 
>> also for the host and other devices which may not require packing/unpacking. 
>> For such devices/host better to avoid packing/unpacking as it introduces 
>> overhead in many cases.
>
> Hi Alexey,
>
> Wouldn't you always need to pack to pass the arguments to the outlined 
> function? What is the benefit of avoiding packing the arguments in the 
> runtime call, if then you have to pack them for the outlined function?
>
> I would really appreciate an example, since I am just getting an 
> understanding of OpenMP in LLVM.
>
> Thanks!

Hi, generally speaking, no, you don't need to pack them. Initially, we 
packed/unpacked args, but then decided not to do it.
Here is an example:

  int a, b;
  #pragma omp parallel
   printf("%d %d\n", a, b);

What we generate currently is something like this:

  %a = alloca i32
  %b = alloca i32
  call __kmpc_fork_call(..., @outlined, %a, %b)
  ...
   internal @outlined(i32 *%a, i32 *%b) {
   printf(....);
   }

`__kmpc_fork_call` inside calls `@outlined` function with the passed args.

>>> If we pack the things eventually to pass it to the workers, why would we 
>>> not pack it right away and avoid complexity? Passing varargs, then packing 
>>> them later (with the same thread) into a record to give it to the workers 
>>> is arguably introducing cost. What is the benefit of a hybrid approach 
>>> given that it is (theoretically) more costly and arguably more complex?




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102107/new/

https://reviews.llvm.org/D102107

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to