On Thu, May 4, 2017 at 7:21 PM, Richard Sandiford
<[email protected]> wrote:
> Richard Biener <[email protected]> writes:
>> On Thu, May 4, 2017 at 2:12 PM, Richard Biener
>> <[email protected]> wrote:
>>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford
>>> <[email protected]> wrote:
>>>> This patch tries to calculate conservatively-correct distance
>>>> vectors for two references whose base addresses are not the same.
>>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
>>>> isn't guaranteed to occur.
>>>>
>>>> The motivating example is:
>>>>
>>>> struct s { int x[8]; };
>>>> void
>>>> f (struct s *a, struct s *b)
>>>> {
>>>> for (int i = 0; i < 8; ++i)
>>>> a->x[i] += b->x[i];
>>>> }
>>>>
>>>> in which the "a" and "b" accesses are either independent or have a
>>>> dependence distance of 0 (assuming -fstrict-aliasing). Neither case
>>>> prevents vectorisation, so we can vectorise without an alias check.
>>>>
>>>> I'd originally wanted to do the same thing for arrays as well, e.g.:
>>>>
>>>> void
>>>> f (int a[][8], struct b[][8])
>>>> {
>>>> for (int i = 0; i < 8; ++i)
>>>> a[0][i] += b[0][i];
>>>> }
>>>>
>>>> I think this is valid because C11 6.7.6.2/6 says:
>>>>
>>>> For two array types to be compatible, both shall have compatible
>>>> element types, and if both size specifiers are present, and are
>>>> integer constant expressions, then both size specifiers shall have
>>>> the same constant value.
>>>>
>>>> So if we access an array through an int (*)[8], it must have type X[8]
>>>> or X[], where X is compatible with int. It doesn't seem possible in
>>>> either case for "a[0]" and "b[0]" to overlap when "a != b".
>>>>
>>>> However, Richard B said that (at least in gimple) we support arbitrary
>>>> overlap of arrays and allow arrays to be accessed with different
>>>> dimensionality. There are examples of this in PR50067. I've therefore
>>>> only handled references that end in a structure field access.
>>>>
>>>> There are two ways of handling these dependences in the vectoriser:
>>>> use them to limit VF, or check at runtime as before. I've gone for
>>>> the approach of checking at runtime if we can, to avoid limiting VF
>>>> unnecessarily. We still fall back to a VF cap when runtime checks
>>>> aren't allowed.
>>>>
>>>> The patch tests whether we queued an alias check with a dependence
>>>> distance of X and then picked a VF <= X, in which case it's safe to
>>>> drop the alias check. Since vect_prune_runtime_alias_check_list can
>>>> be called twice with different VF for the same loop, it's no longer
>>>> safe to clear may_alias_ddrs on exit. Instead we should use
>>>> comp_alias_ddrs to check whether versioning is necessary.
>>>>
>>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
>>>
>>> You seem to do your "fancy" thing but also later compute the old
>>> base equality anyway (for same_base_p). It looks to me for this
>>> case the new fancy code can be simply skipped, keeping num_dimensions
>>> as before?
>>>
>>> + /* Try to approach equal type sizes. */
>>> + if (!COMPLETE_TYPE_P (type_a)
>>> + || !COMPLETE_TYPE_P (type_b)
>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
>>> + break;
>>>
>>> ah, interesting idea to avoid a quadratic search. Note that you should
>>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR
>>> as they are used for type-punning.
>
> All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs,
> ARRAY_REFs or COMPONENT_REFs of structures, since that's all that
> dr_analyze_indices allows, so I think we safe in terms of the tree codes.
Yeah. I think we need to document that we should have a 1:1 match here.
>>> I see nonoverlapping_component_refs_of_decl_p should simply skip
>>> ARRAY_REFs - but I also see there:
>>>
>>> /* ??? We cannot simply use the type of operand #0 of the refs here
>>> as the Fortran compiler smuggles type punning into COMPONENT_REFs
>>> for common blocks instead of using unions like everyone else. */
>>> tree type1 = DECL_CONTEXT (field1);
>>> tree type2 = DECL_CONTEXT (field2);
>>>
>>> so you probably can't simply use TREE_TYPE (outer_ref) for type
>>> compatibility.
>>> You also may not use types_compatible_p here as for LTO that is _way_ too
>>> lax for aggregates. The above uses
>>>
>>> /* We cannot disambiguate fields in a union or qualified union. */
>>> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>>> return false;
>>>
>>> so you should also bail out on unions here, rather than the check you do
>>> later.
>
> The loop stops before we get to a union, so I think "only" the RECORD_TYPE
> COMPONENT_REF handling is a potential problem. Does this mean that
> I should use the nonoverlapping_component_refs_of_decl_p code:
>
> tree field1 = TREE_OPERAND (ref1, 1);
> tree field2 = TREE_OPERAND (ref2, 1);
>
> /* ??? We cannot simply use the type of operand #0 of the refs here
> as the Fortran compiler smuggles type punning into COMPONENT_REFs
> for common blocks instead of using unions like everyone else. */
> tree type1 = DECL_CONTEXT (field1);
> tree type2 = DECL_CONTEXT (field2);
>
> /* We cannot disambiguate fields in a union or qualified union. */
> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
> return false;
>
> if (field1 != field2)
> {
> /* A field and its representative need to be considered the
> same. */
> if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2
> || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1)
> return false;
> /* Different fields of the same record type cannot overlap.
> ??? Bitfields can overlap at RTL level so punt on them. */
> if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
> return false;
> return true;
> }
>
> as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p
> during the new loop?
Yes. OTOH you want to "match" while the above disambiguates. So it means
you should use either FIELD_DECL equality or DECL_CONTEXT of the FIELD_DECL
equality (which should be the same in the end). The RTL concern
should not matter
here.
> And test for this as well as unions in the outer
> references?
So looking at dr_analyze_indices a union would be always the DR_BASE_OBJECT,
and you (should) stop the ref walk at DR_BASE_OBJECT. The dr_analyze_indices
code is also somewhat fishy in that it simply ignores everything below unhandled
component-refs even if there are indices involved (and it gets away
with this because
dependence analysis likely/hopefully gives up on the DR_BASE_OBJECT equality
test in case it is sth like a[i].union for example ... hopefully ...).
>>> You seem to rely on getting an access_fn entry for each handled_component_p.
>>> It looks like this is the case -- we even seem to stop at unions (with the
>>> same
>>> fortran "issue"). I'm not sure that's the best thing to do but you
>>> rely on that.
>
> Yeah, the loop is deliberately limited to the components associated with
> an access_fn. I did wonder at first whether dr_analyze_indices should
> store the original component reference trees for each access function.
> That would make things simpler and more explicit, but would also eat up
> more memory. Things like object_address_invariant_in_loop_p rely on the
> access_fns in the same way that the loop in the patch does.
in fact it fails to handle ARRAY_RANGE_REFs ...
>>> I don't understand the looping, it needs more comments. You seem to be
>>> looking for the innermost compatible RECORD_TYPE but then num_dimensions
>>> is how many compatible refs you found on the way (with incompatible ones
>>> not counting?!). What about an inner varying array of structs? This seems
>>> to
>>> be disregarded in the analysis now? Thus, a[i].s.b[i].j vs. __real
>>> b[i].s.b[i].j?
>
> I'll try to improve the comments. But the idea is that both sequences are
> as long as possible, while that still gives compatible types. If there is
> more than one such sequence, we pick the one nearest the base.
>
> So in your example, the access functions would be:
>
> 0 1 2 3 4
> a: .j [i] .b .s [i]
>
> 0 1 2 3 4 5
> b: __real .j [i] .b .s [i]
>
> If a and b are pointers, the final access functions would be
> unconstrained base accesses, so we'd end up with:
>
> a: [0, 3]
> b: [1, 4]
>
> for both sequences.
>
>>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p
>>> conveniently start from the other
>>> end of the ref here.
>>
>> That said, for the motivational cases we either have one ref having
>> more dimensions than the other (the __real vs. full complex access) or
>> they have the same number of dimensions (and no access fn for the
>> base).
>>
>> For the first case we should simply "drop" access_fns of the larger
>> dimensional ref (from the start, plus outer component refs) up to the
>> point the number of dimensions are equal.
>
> Yeah, that's what happens for your example. But if we had:
>
> a[i].s.c.d
> __real b[i].s.b[i].j
>
> (where d is the same type as the real component) then the access
> functions would be:
>
> 0 1 2 3
> a: .d .c .s [i]
>
> 0 1 2 3 4 5
> b: __real .j [i] .b .s [i]
>
> Comparing the a0/b2 column doesn't make sense, because one's an array
> and the other is a structure. In this case the sequence we care about is:
>
> a: [1, 3]
> b: [3, 5]
>
> which is what the loop gives. The a1/b3 column is the one that proves
> there's no dependence.
>
>> Then we have the case of
>>
>> ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b))
>>
>> where we have to punt.
>>
>> Then we have the case of
>>
>> ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
>>
>> which is where the new code should kick in to see if we can drop access_fns
>> from the other end (as unanalyzable but either having distance zero or not
>> aliased because of TBAA).
>>
>> At least your testcases suggest you do not want to handle
>>
>> struct s { int x[N]; };
>> struct r { struct s s; };
>> f (struct s *a, struct r *b)
>> {
>> for (i = 0; i < N; ++i)
>> a->s.x[i] = b->x[i];
>> }
>>
>> ?
>>
>> With this example your loop which seems to search for a "common"
>> sequence in (different) midst of the reference trees makes more sense
>> (still that loop is awkward to understand).
>
> Yeah, I want to handle that too, just hadn't thought of it as a specific
> testcase. The code does give the expected dependence distance of 0.
Ok.
I think the patch is reasonable, maybe the loop can be restructured / simplified
a bit and handling of the union case for example be done first (by looking at
DR_BASE_OBJECT).
Thanks,
Richard.
>
> Thanks,
> Richard