On Wed, Nov 20, 2024 at 10:19:13AM +0100, Richard Biener wrote:
> On Tue, Nov 19, 2024 at 5:46 PM Lewis Hyatt <lhy...@gmail.com> wrote:
> >
> > On Tue, Nov 19, 2024 at 9:59 AM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> > >
> > > On Sun, Nov 17, 2024 at 4:28 AM Lewis Hyatt <lhy...@gmail.com> wrote:
> > > >
> > > > Currently, when we allocate a gphi object, we round up the capacity for 
> > > > the
> > > > trailing arguments array such that it will make full use of the page 
> > > > size
> > > > that ggc will allocate. While there is also an explicit minimum of 2
> > > > arguments, in practice after rounding to the ggc page size there is 
> > > > always
> > > > room for at least 4.
> > > >
> > > > It seems we have some code that has come to depend on there being this 
> > > > much
> > > > room before reallocation of a PHI is required. For example, the function
> > > > loop_version () used during loop optimization will make sure there is 
> > > > room
> > > > for an additional edge on each PHI that it processes. But there are call
> > > > sites which cache a PHI pointer prior to calling loop_version () and 
> > > > assume
> > > > it remains valid afterward, thus implicitly assuming that the PHI will 
> > > > have
> > > > spare capacity. Examples include split_loop () and gen_parallel_loop ().
> > > >
> > > > This works fine now, but if the size of a gphi becomes larger, e.g. due 
> > > > to
> > > > configuring location_t to be a 64-bit type, then on 32-bit platforms it 
> > > > ends
> > > > up being possible to get a gphi with only 2 arguments of capacity, 
> > > > causing
> > > > the above call sites of loop_version () to fail. (They store a pointer 
> > > > to a
> > > > gphi object that no longer has the same meaning it did before it got
> > > > reallocated.) The testcases gcc.dg/torture/pr113707-2.c and
> > > > gcc.dg/graphite/pr81945.c exhibit that failure mode.
> > > >
> > > > It may be necessary to adjust those call sites to make this more 
> > > > robust, but
> > > > in the meantime, changing the minimum from 2 to 4 does no harm given the
> > > > minimum is practically 4 anyway, and it resolves the issue for 32-bit
> > > > platforms.
> > >
> > > We need to fix the users.  Note ideal_phi_node_len rounds up to a power 
> > > of two
> > > but extra_order_size_table also has MAX_ALIGNMENT * n with n from 1 to 16
> > > buckets, so such extensive rounding up is not needed.
> > >
> > > The cache is also quite useless this way (I didn't fix this when last 
> > > working
> > > there).
> > >
> >
> > Adjusting the call sites definitely sounds right, but I worry it's
> > potentially a big change?
> 
> I already had to fixup quite some places because gphis now can be
> ggc_free()d when removed or re-allocated.  It seems you simply uncovered
> more of those places.
> 
> > So one of the call sites that caused problems here was around line 620
> > in tree-ssa-loop-split.cc:
> >
> >  /* Find a loop PHI node that defines guard_iv directly,
> >            or create one doing that.  */
> >         gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
> >         if (!phi)
> >
> > It remembers "phi" and reuses it later when it might have been
> > invalidated. That one is easy to fix, find_or_create_guard_phi() can
> > just be called again later.
> 
> One trick is to instead remember the PHI result SSA variable and
> later check its SSA_NAME_DEF_STMT which will be the re-allocated
> PHI.  I think this should work almost everywhere for the re-allocation issue.
> 
> > But the other one I ran into with testing was in tree-parloops.cc:
> >
> > /* Element of the hashtable, representing a
> >    reduction in the current loop.  */
> > struct reduction_info
> > {
> >   gimple *reduc_stmt;           /* reduction statement.  */
> >   gimple *reduc_phi;            /* The phi node defining the reduction.  */
> >   enum tree_code reduction_code;/* code for the reduction operation.  */
> >   unsigned reduc_version;       /* SSA_NAME_VERSION of original reduc_phi
> >                                    result.  */
> >   gphi *keep_res;               /* The PHI_RESULT of this phi is the
> > resulting value
> >                                    of the reduction variable when
> > existing the loop. */
> >   tree initial_value;           /* The initial value of the reduction
> > var before entering the loop.  */
> >   tree field;                   /*  the name of the field in the
> > parloop data structure intended for reduction.  */
> >   tree reduc_addr;              /* The address of the reduction variable for
> >                                    openacc reductions.  */
> >   tree init;                    /* reduction initialization value.  */
> >   gphi *new_phi;                /* (helper field) Newly created phi
> > node whose result
> >                                    will be passed to the atomic
> > operation.  Represents
> >                                    the local result each thread
> > computed for the reduction
> >                                    operation.  */
> > };
> >
> > It keeps a hash map of these throughout the pass and the whole design
> > is predicated on those pointers always remaining valid, so I think
> > this would need extensive redesign? I did not look into all the
> > details there though.
> >
> > Once I saw that, I tried the approach shown here of "just" changing
> > the minimum number to 4, given that on most platforms there are always
> > 4 anyway, and that fixed both of these on sparc as well as other
> > issues that I hadn't looked into yet, so there are going to be more as
> > well and I'm not sure how pervasive it is.
> >
> > Does "fix the users" mean, change all call sites to understand that a
> > phi pointer could be invalidated and handle it, or should it mean,
> > change all call sites to arrange that none of the phi pointers they
> > use will be invalidated?
> 
> Well, "fix" means either of the two, whatever looks cleaner or easier.
> 
> > I am not very familiar with these internals
> > yet, although I am happy to look into it. Do you think it's a
> > prerequisite for the location_t change or it could be a followup
> > later, given it doesn't have any effect on 64-bit platforms anyway?
> 
> As you ran into these specific cases can you at least file individual
> bugzillas?  If the issues cause testsuite fallout on 32bit platforms
> we should fix those before GCC 15, so technically it blocks the change.
> It would for example be bad if bootstrap breaks on arm32.
> 
> Richard.
> 

Thank you for all the pointers. I will remove this patch and replace it
with one that fixes all the call sites.

-Lewis

Reply via email to