[PATCH] c++: coroutines - Overlap variables in frame [PR105989]
Currently, coroutine frames store all variables of a coroutine separately, even if their lifetime does not overlap (they are in distinct scopes). This patch implements overlapping distinct variable scopes in the coroutine frame, by storing the frame fields in nested unions of structs. This lowers the size of the frame for larger coroutines significantly, and makes them more usable on systems with limited memory. Bootstrapped and regression tested on x86_64-pc-linux-gnu; new test fails before the patch and succeeds after with no regressions. PR c++/105989 gcc/cp/ChangeLog: * coroutines.cc (struct local_var_info): Add field_access_path. (build_local_var_frame_access_expr): New. (transform_local_var_uses): Use build_local_var_frame_access_expr. (coro_make_frame_entry_id): New. (coro_make_frame_entry): Delegate to coro_make_frame_entry_id. (struct local_vars_frame_data): Add orig, field_access_path. (register_local_var_uses): Generate new frame layout. Create access paths to vars. (morph_fn_to_coro): Set new fields in local_vars_frame_data. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr105989.C: New test. pr105989.patch Description: Binary data
Re: [PATCH] c++: coroutines - Overlap variables in frame [PR105989]
Hi Iain, Thanks for the reply, this is my first time contributing and I am looking forward to your input. One other related thing I would like to investigate is reducing the number of compiler generated variables in the frame, particularly _Coro_destroy_fn and _Coro_self_handle. As I understand it, _Coro_destroy_fn just sets a flag in _Coro_resume_index and calls _Coro_resume_fn; it should be possible to move this logic to __builtin_coro_destroy, so that only _Coro_resume_fn is stored in the frame; this would however change the coroutine ABI - I don't know if that's a problem. The _Coro_self_handle should be constructible on-demand from the frame address. Do you have any advice / opinions on this before I try to implement it? Michal On Jul 12 2022, at 4:08 pm, Iain Sandoe wrote: > Hi Michal, > >> On 12 Jul 2022, at 14:35, Michal Jankovič via Gcc-patches >> wrote: >> >> Currently, coroutine frames store all variables of a coroutine separately, >> even if their lifetime does not overlap (they are in distinct >> scopes). This >> patch implements overlapping distinct variable scopes in the >> coroutine frame, >> by storing the frame fields in nested unions of structs. This lowers >> the size >> of the frame for larger coroutines significantly, and makes them more usable >> on systems with limited memory. > > not a review (I will try to take a look at the weekend). > > but … this is one of the two main optimisations on my TODO - so cool > for doing it. > > (the other related optimisation is to eliminate frame entries for > scopes without any suspend > points - which has the potential to save even more space for code with > sparse use of co_) > > Iain > >> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new test fails >> before the patch and succeeds after with no regressions. >> >> PR c++/105989 >> >> gcc/cp/ChangeLog: >> >> * coroutines.cc (struct local_var_info): Add field_access_path. >> (build_local_var_frame_access_expr): New. >> (transform_local_var_uses): Use build_local_var_frame_access_expr. >> (coro_make_frame_entry_id): New. >> (coro_make_frame_entry): Delegate to coro_make_frame_entry_id. >> (struct local_vars_frame_data): Add orig, field_access_path. >> (register_local_var_uses): Generate new frame layout. Create access >> paths to vars. >> (morph_fn_to_coro): Set new fields in local_vars_frame_data. >> >> gcc/testsuite/ChangeLog: >> >> * g++.dg/coroutines/pr105989.C: New test. >> >> > >
Re: [PATCH] c++: coroutines - Overlap variables in frame [PR105989]
; is not referenced by the awaiter; another unused object gets stored if > > the .await_transform() customization point was used. > > What are your thoughts on the feasibility / difficulty of implementing > such an optimization? > > Michal > >>> >>> Do you have any advice / opinions on this before I try to implement it? >> >> Hopefully, the notes above help. >> >> I will rebase my latest code changes as soon as I have a chance and >> put them somewhere >> for you to look at - basically, these are to try and address the >> correctness issues we face, >> >> Iain >> >> >>> >>> Michal >>> >>> On Jul 12 2022, at 4:08 pm, Iain Sandoe wrote: >>> >>>> Hi Michal, >>>> >>>>> On 12 Jul 2022, at 14:35, Michal Jankovič via Gcc-patches >>>>> wrote: >>>>> >>>>> Currently, coroutine frames store all variables of a coroutine separately, >>>>> even if their lifetime does not overlap (they are in distinct >>>>> scopes). This >>>>> patch implements overlapping distinct variable scopes in the >>>>> coroutine frame, >>>>> by storing the frame fields in nested unions of structs. This lowers >>>>> the size >>>>> of the frame for larger coroutines significantly, and makes them >>>>> more usable >>>>> on systems with limited memory. >>>> >>>> not a review (I will try to take a look at the weekend). >>>> >>>> but … this is one of the two main optimisations on my TODO - so cool >>>> for doing it. >>>> >>>> (the other related optimisation is to eliminate frame entries for >>>> scopes without any suspend >>>> points - which has the potential to save even more space for code with >>>> sparse use of co_) >>>> >>>> Iain >>>> >>>>> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new >>>>> test fails >>>>> before the patch and succeeds after with no regressions. >>>>> >>>>> PR c++/105989 >>>>> >>>>> gcc/cp/ChangeLog: >>>>> >>>>> * coroutines.cc (struct local_var_info): Add field_access_path. >>>>> (build_local_var_frame_access_expr): New. >>>>> (transform_local_var_uses): Use build_local_var_frame_access_expr. >>>>> (coro_make_frame_entry_id): New. >>>>> (coro_make_frame_entry): Delegate to coro_make_frame_entry_id. >>>>> (struct local_vars_frame_data): Add orig, field_access_path. >>>>> (register_local_var_uses): Generate new frame layout. Create access >>>>> paths to vars. >>>>> (morph_fn_to_coro): Set new fields in local_vars_frame_data. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * g++.dg/coroutines/pr105989.C: New test. >>>>> >>>>> >>>> >>>> >> >> > pr105989.patch Description: Binary data
Re: [PATCH] c++: coroutines - Overlap variables in frame [PR105989]
me - I >>>> suppose that is a trade-off of one oiptimisation c.f. another … >>> >>> If the handle construction cannot be optimized out, and its thus >>> a tradeoff between frame size and number of instructions, then this >>> could also be enabled by a hypothetical `-fsmall-coroutine-frame`. >>> >>> Coming back to this: >>> >>>>>> (the other related optimisation is to eliminate frame entries for >>>>>> scopes without any suspend >>>>>> points - which has the potential to save even more space for code with >>>>>> sparse use of co_) >>> >>> This would be nice; although it could encompassed by a more general >>> >>> optimization - eliminate frame entries for all variables which are not >>> >>> accessed (directly or via pointer / reference) beyond a suspend point. >>> To be fair, I do not know how to get started on such an optimization, >>> or if it is even possible to do on the frontend. This would however be >>> immensely useful for reducing the frame size taken-up by complicated >>> co_await expressions (among other things), for example, if I have a >>> composed operation: >>> >>> co_await when_either(get_leaf_awaitable_1(), get_leaf_awaitable_2()); >>> >>> Right now, this creates space in the frame for the temporary 'leaf' >>> >>> awaitables, which were already moved into the composed awaitable. >>> If the awaitable has an operator co_await that returns the real awaiter, >>> the original awaitable is also stored in the frame, even if it >>> is not referenced by the awaiter; another unused object gets stored if >>> >>> the .await_transform() customization point was used. >>> >>> What are your thoughts on the feasibility / difficulty of implementing >>> such an optimization? >>> >>> Michal >>> >>>>> >>>>> Do you have any advice / opinions on this before I try to >>>>> implement it? >>>> >>>> Hopefully, the notes above help. >>>> >>>> I will rebase my latest code changes as soon as I have a chance and >>>> put them somewhere >>>> for you to look at - basically, these are to try and address the >>>> correctness issues we face, >>>> >>>> Iain >>>> >>>> >>>>> >>>>> Michal >>>>> >>>>> On Jul 12 2022, at 4:08 pm, Iain Sandoe wrote: >>>>> >>>>>> Hi Michal, >>>>>> >>>>>>> On 12 Jul 2022, at 14:35, Michal Jankovič via Gcc-patches >>>>>>> wrote: >>>>>>> >>>>>>> Currently, coroutine frames store all variables of a coroutine >>>>>>> separately, >>>>>>> even if their lifetime does not overlap (they are in distinct >>>>>>> scopes). This >>>>>>> patch implements overlapping distinct variable scopes in the >>>>>>> coroutine frame, >>>>>>> by storing the frame fields in nested unions of structs. This lowers >>>>>>> the size >>>>>>> of the frame for larger coroutines significantly, and makes them >>>>>>> more usable >>>>>>> on systems with limited memory. >>>>>> >>>>>> not a review (I will try to take a look at the weekend). >>>>>> >>>>>> but … this is one of the two main optimisations on my TODO - so cool >>>>>> for doing it. >>>>>> >>>>>> (the other related optimisation is to eliminate frame entries for >>>>>> scopes without any suspend >>>>>> points - which has the potential to save even more space for code with >>>>>> sparse use of co_) >>>>>> >>>>>> Iain >>>>>> >>>>>>> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new >>>>>>> test fails >>>>>>> before the patch and succeeds after with no regressions. >>>>>>> >>>>>>> PR c++/105989 >>>>>>> >>>>>>> gcc/cp/ChangeLog: >>>>>>> >>>>>>> * coroutines.cc (struct local_var_info): Add field_access_path. >>>>>>> (build_local_var_frame_access_expr): New. >>>>>>> (transform_local_var_uses): Use >>>>>>> build_local_var_frame_access_expr. >>>>>>> (coro_make_frame_entry_id): New. >>>>>>> (coro_make_frame_entry): Delegate to coro_make_frame_entry_id. >>>>>>> (struct local_vars_frame_data): Add orig, field_access_path. >>>>>>> (register_local_var_uses): Generate new frame layout. Create >>>>>>> access >>>>>>> paths to vars. >>>>>>> (morph_fn_to_coro): Set new fields in local_vars_frame_data. >>>>>>> >>>>>>> >>>>>>> gcc/testsuite/ChangeLog: >>>>>>> >>>>>>> * g++.dg/coroutines/pr105989.C: New test. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>>> >> > >