ABataev added a comment. In D102107#2867670 <https://reviews.llvm.org/D102107#2867670>, @josemonsalve2 wrote:
> In D102107#2867417 <https://reviews.llvm.org/D102107#2867417>, @ABataev wrote: > >> In D102107#2867382 <https://reviews.llvm.org/D102107#2867382>, @jdoerfert >> wrote: >> >>> In D102107#2832740 <https://reviews.llvm.org/D102107#2832740>, @ABataev >>> wrote: >>> >>>> In D102107#2832286 <https://reviews.llvm.org/D102107#2832286>, @jdoerfert >>>> wrote: >>>> >>>>> In D102107#2824581 <https://reviews.llvm.org/D102107#2824581>, @ABataev >>>>> wrote: >>>>> >>>>>> In D102107#2823706 <https://reviews.llvm.org/D102107#2823706>, >>>>>> @jdoerfert wrote: >>>>>> >>>>>>> In D102107#2821976 <https://reviews.llvm.org/D102107#2821976>, @ABataev >>>>>>> wrote: >>>>>>> >>>>>>>> We used this kind of codegen initially but later found out that it >>>>>>>> causes a large overhead when gathering pointers into a record. What >>>>>>>> about hybrid scheme where the first args are passed as arguments and >>>>>>>> others (if any) are gathered into a record? >>>>>>> >>>>>>> I'm confused, maybe I misunderstand the problem. The parallel function >>>>>>> arguments need to go from the main thread to the workers somehow, I >>>>>>> don't see how this is done w/o a record. This patch makes it explicit >>>>>>> though. >>>>>> >>>>>> Pass it in a record for workers only? And use a hybrid scheme for all >>>>>> other parallel regions. >>>>> >>>>> I still do not follow. What does it mean for workers only? What is a >>>>> hybrid scheme? And, probably most importantly, how would we not >>>>> eventually put everything into a record anyway? >>>> >>>> On the host you don’t need to put everything into a record, especially for >>>> small parallel regions. Pass some first args in registers and only the >>>> remaining args gather into the record. For workers just pass all args in >>>> the record. >>> >>> Could you please respond to my question so we make progress here. We >>> *always* have to pass things in a record, do you agree? >> >> On the GPU device, yes. And I'm absolutely fine with packing args for the >> GPU device. But the patch packs the args not only for the GPU devices but >> also for the host and other devices which may not require packing/unpacking. >> For such devices/host better to avoid packing/unpacking as it introduces >> overhead in many cases. > > Hi Alexey, > > Wouldn't you always need to pack to pass the arguments to the outlined > function? What is the benefit of avoiding packing the arguments in the > runtime call, if then you have to pack them for the outlined function? > > I would really appreciate an example, since I am just getting an > understanding of OpenMP in LLVM. > > Thanks! Hi, generally speaking, no, you don't need to pack them. Initially, we packed/unpacked args, but then decided not to do it. Here is an example: int a, b; #pragma omp parallel printf("%d %d\n", a, b); What we generate currently is something like this: %a = alloca i32 %b = alloca i32 call __kmpc_fork_call(..., @outlined, %a, %b) ... internal @outlined(i32 *%a, i32 *%b) { printf(....); } `__kmpc_fork_call` inside calls `@outlined` function with the passed args. >>> If we pack the things eventually to pass it to the workers, why would we >>> not pack it right away and avoid complexity? Passing varargs, then packing >>> them later (with the same thread) into a record to give it to the workers >>> is arguably introducing cost. What is the benefit of a hybrid approach >>> given that it is (theoretically) more costly and arguably more complex? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D102107/new/ https://reviews.llvm.org/D102107 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits