ABataev added a comment.

In D102107#2867382 <https://reviews.llvm.org/D102107#2867382>, @jdoerfert wrote:

> In D102107#2832740 <https://reviews.llvm.org/D102107#2832740>, @ABataev wrote:
>
>> In D102107#2832286 <https://reviews.llvm.org/D102107#2832286>, @jdoerfert 
>> wrote:
>>
>>> In D102107#2824581 <https://reviews.llvm.org/D102107#2824581>, @ABataev 
>>> wrote:
>>>
>>>> In D102107#2823706 <https://reviews.llvm.org/D102107#2823706>, @jdoerfert 
>>>> wrote:
>>>>
>>>>> In D102107#2821976 <https://reviews.llvm.org/D102107#2821976>, @ABataev 
>>>>> wrote:
>>>>>
>>>>>> We used this kind of codegen initially but later found out that it 
>>>>>> causes a large overhead when gathering pointers into a record. What 
>>>>>> about hybrid scheme where the first args are passed as arguments and 
>>>>>> others (if any) are gathered into a record?
>>>>>
>>>>> I'm confused, maybe I misunderstand the problem. The parallel function 
>>>>> arguments need to go from the main thread to the workers somehow, I don't 
>>>>> see how this is done w/o a record. This patch makes it explicit though.
>>>>
>>>> Pass it in a record for workers only? And use a hybrid scheme for all 
>>>> other parallel regions.
>>>
>>> I still do not follow. What does it mean for workers only? What is a hybrid 
>>> scheme? And, probably most importantly, how would we not eventually put 
>>> everything into a record anyway?
>>
>> On the host you don’t need to put everything into a record, especially for 
>> small parallel regions. Pass some first args in registers and only the 
>> remaining args gather into the record. For workers just pass all args in the 
>> record.
>
> Could you please respond to my question so we make progress here. We *always* 
> have to pass things in a record, do you agree?

On the GPU device, yes. And I'm absolutely fine with packing args for the GPU 
device. But the patch packs the args not only for the GPU devices but also for 
the host and other devices which may not require packing/unpacking. For such 
devices/host better to avoid packing/unpacking as it introduces overhead in 
many cases.

> If we pack the things eventually to pass it to the workers, why would we not 
> pack it right away and avoid complexity? Passing varargs, then packing them 
> later (with the same thread) into a record to give it to the workers is 
> arguably introducing cost. What is the benefit of a hybrid approach given 
> that it is (theoretically) more costly and arguably more complex?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102107/new/

https://reviews.llvm.org/D102107

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to