On Wed, Nov 20, 2024 at 10:19:13AM +0100, Richard Biener wrote: > On Tue, Nov 19, 2024 at 5:46 PM Lewis Hyatt <lhy...@gmail.com> wrote: > > > > On Tue, Nov 19, 2024 at 9:59 AM Richard Biener > > <richard.guent...@gmail.com> wrote: > > > > > > On Sun, Nov 17, 2024 at 4:28 AM Lewis Hyatt <lhy...@gmail.com> wrote: > > > > > > > > Currently, when we allocate a gphi object, we round up the capacity for > > > > the > > > > trailing arguments array such that it will make full use of the page > > > > size > > > > that ggc will allocate. While there is also an explicit minimum of 2 > > > > arguments, in practice after rounding to the ggc page size there is > > > > always > > > > room for at least 4. > > > > > > > > It seems we have some code that has come to depend on there being this > > > > much > > > > room before reallocation of a PHI is required. For example, the function > > > > loop_version () used during loop optimization will make sure there is > > > > room > > > > for an additional edge on each PHI that it processes. But there are call > > > > sites which cache a PHI pointer prior to calling loop_version () and > > > > assume > > > > it remains valid afterward, thus implicitly assuming that the PHI will > > > > have > > > > spare capacity. Examples include split_loop () and gen_parallel_loop (). > > > > > > > > This works fine now, but if the size of a gphi becomes larger, e.g. due > > > > to > > > > configuring location_t to be a 64-bit type, then on 32-bit platforms it > > > > ends > > > > up being possible to get a gphi with only 2 arguments of capacity, > > > > causing > > > > the above call sites of loop_version () to fail. (They store a pointer > > > > to a > > > > gphi object that no longer has the same meaning it did before it got > > > > reallocated.) The testcases gcc.dg/torture/pr113707-2.c and > > > > gcc.dg/graphite/pr81945.c exhibit that failure mode. > > > > > > > > It may be necessary to adjust those call sites to make this more > > > > robust, but > > > > in the meantime, changing the minimum from 2 to 4 does no harm given the > > > > minimum is practically 4 anyway, and it resolves the issue for 32-bit > > > > platforms. > > > > > > We need to fix the users. Note ideal_phi_node_len rounds up to a power > > > of two > > > but extra_order_size_table also has MAX_ALIGNMENT * n with n from 1 to 16 > > > buckets, so such extensive rounding up is not needed. > > > > > > The cache is also quite useless this way (I didn't fix this when last > > > working > > > there). > > > > > > > Adjusting the call sites definitely sounds right, but I worry it's > > potentially a big change? > > I already had to fixup quite some places because gphis now can be > ggc_free()d when removed or re-allocated. It seems you simply uncovered > more of those places. > > > So one of the call sites that caused problems here was around line 620 > > in tree-ssa-loop-split.cc: > > > > /* Find a loop PHI node that defines guard_iv directly, > > or create one doing that. */ > > gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv); > > if (!phi) > > > > It remembers "phi" and reuses it later when it might have been > > invalidated. That one is easy to fix, find_or_create_guard_phi() can > > just be called again later. > > One trick is to instead remember the PHI result SSA variable and > later check its SSA_NAME_DEF_STMT which will be the re-allocated > PHI. I think this should work almost everywhere for the re-allocation issue. > > > But the other one I ran into with testing was in tree-parloops.cc: > > > > /* Element of the hashtable, representing a > > reduction in the current loop. */ > > struct reduction_info > > { > > gimple *reduc_stmt; /* reduction statement. */ > > gimple *reduc_phi; /* The phi node defining the reduction. */ > > enum tree_code reduction_code;/* code for the reduction operation. */ > > unsigned reduc_version; /* SSA_NAME_VERSION of original reduc_phi > > result. */ > > gphi *keep_res; /* The PHI_RESULT of this phi is the > > resulting value > > of the reduction variable when > > existing the loop. */ > > tree initial_value; /* The initial value of the reduction > > var before entering the loop. */ > > tree field; /* the name of the field in the > > parloop data structure intended for reduction. */ > > tree reduc_addr; /* The address of the reduction variable for > > openacc reductions. */ > > tree init; /* reduction initialization value. */ > > gphi *new_phi; /* (helper field) Newly created phi > > node whose result > > will be passed to the atomic > > operation. Represents > > the local result each thread > > computed for the reduction > > operation. */ > > }; > > > > It keeps a hash map of these throughout the pass and the whole design > > is predicated on those pointers always remaining valid, so I think > > this would need extensive redesign? I did not look into all the > > details there though. > > > > Once I saw that, I tried the approach shown here of "just" changing > > the minimum number to 4, given that on most platforms there are always > > 4 anyway, and that fixed both of these on sparc as well as other > > issues that I hadn't looked into yet, so there are going to be more as > > well and I'm not sure how pervasive it is. > > > > Does "fix the users" mean, change all call sites to understand that a > > phi pointer could be invalidated and handle it, or should it mean, > > change all call sites to arrange that none of the phi pointers they > > use will be invalidated? > > Well, "fix" means either of the two, whatever looks cleaner or easier. > > > I am not very familiar with these internals > > yet, although I am happy to look into it. Do you think it's a > > prerequisite for the location_t change or it could be a followup > > later, given it doesn't have any effect on 64-bit platforms anyway? > > As you ran into these specific cases can you at least file individual > bugzillas? If the issues cause testsuite fallout on 32bit platforms > we should fix those before GCC 15, so technically it blocks the change. > It would for example be bad if bootstrap breaks on arm32. > > Richard. >
Thank you for all the pointers. I will remove this patch and replace it with one that fixes all the call sites. -Lewis