On Mon, Jun 26, 2017 at 1:50 PM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > Richard Biener <richard.guent...@gmail.com> writes: >> On Mon, Jun 26, 2017 at 1:14 PM, Richard Sandiford >> <richard.sandif...@linaro.org> wrote: >>> Richard Biener <richard.guent...@gmail.com> writes: >>>> On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford >>>> <richard.sandif...@linaro.org> wrote: >>>>> Richard Biener <richard.guent...@gmail.com> writes: >>>>>> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford >>>>>> <richard.sandif...@linaro.org> wrote: >>>>>>> The test case triggered this assert in >>>>>>> vect_update_misalignment_for_peel: >>>>>>> >>>>>>> gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >>>>>>> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >>>>>>> >>>>>>> We knew that the two DRs had the same misalignment at runtime, but when >>>>>>> considered in isolation, one data reference guaranteed a higher >>>>>>> compile-time >>>>>>> base alignment than the other. >>>>>>> >>>>>>> In the test case this looks like a missed opportunity. Both references >>>>>>> are unconditional, so it should be possible to use the highest of the >>>>>>> available base alignment guarantees when analyzing each reference. >>>>>>> The patch does this. >>>>>>> >>>>>>> However, as the comment in the patch says, the base alignment guarantees >>>>>>> provided by a conditional reference only apply if the reference occurs >>>>>>> at least once. In this case it would be legitimate for two references >>>>>>> to have the same runtime misalignment and for one reference to provide a >>>>>>> stronger compile-time guarantee than the other about what the >>>>>>> misalignment >>>>>>> actually is. The patch therefore relaxes the assert to handle that >>>>>>> case. >>>>>> >>>>>> Hmm, but you don't actually check whether a reference occurs only >>>> conditional, >>>>>> do you? You just seem to say that for masked loads/stores the reference >>>>>> is conditional (I believe that's not true). But for a loop like >>>>>> >>>>>> for (;;) >>>>>> if (a[i]) >>>>>> sum += b[j]; >>>>>> >>>>>> you still assume b[j] executes unconditionally? >>>>> >>>>> Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL >>>>> was supposed to mean "even if the containing statement executes >>>>> and runs to completion, the reference might not actually occur". >>>>> The example above isn't conditional in that sense because the >>>>> reference to b[j] does occur if the store is reached and completes. >>>>> >>>>> Masked loads and stores are conditional in that sense though. >>>>> The reference only occurs if the mask is nonzero; the memory >>>>> isn't touched otherwise. The functions are used to if-convert >>>>> things like: >>>>> >>>>> for (...) >>>>> a[i] = b[i] ? c[i] : d[i]; >>>>> >>>>> where there's no guarantee that it's safe to access c[i] when !b[i] >>>>> (or d[i] when b[i]). No reference occurs for an all-false mask. >>>> >>>> But as you touch generic data-ref code here you should apply more >>>> sensible semantics to DR_IS_CONDITIONAL than just marking >>>> masked loads/stores but not DRs occuring inside BBs only executed >>>> conditionally ... >>> >>> I don't see why that's more sensible though. If a statement is only >>> conditionally executed in a loop, it's up to the consumer to decide >>> what to do about that. The conditions under which the statement >>> is reached are a control-flow issue and tree-data-ref.c doesn't >>> have any special information about it. >>> >>> Masked loads and stores are special because the DR_REFs created by >>> tree-data-ref.c are artificial: they didn't exist as MEM_REFs in the >>> original DR_STMT. And AIUI they didn't exist as MEM_REFs precisely >>> because they're not guaranteed to happen, even if the load or store >>> statement itself is executed. So in this case the DR_IS_CONDITIONAL >>> is reflecting something that tree-data-ref.c itself has done. >>> >>> How about calling it DR_IS_CONDITIONAL_IN_STMT to avoid the >>> general-sounding name? >> >> That sounds better and avoids the ambiguity. > > OK. > >>>>>> The vectorizer of course only sees unconditionally executed stmts. >>>>>> >>>>>> So - I'd simply not add this DR_IS_CONDITIONAL. Did you run into >>>>>> any real-world (testsuite) issues without this? >>>>> >>>>> Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment >>>>> assumptions in silly corner cases. I could add a scan test for it, >>>>> for targets with masked loads and stores. It wouldn't trigger >>>>> an execution failure though because we assume that targets with >>>>> masked loads and stores allow unaligned accesses: >>>>> >>>>> /* For now assume all conditional loads/stores support unaligned >>>>> access without any special code. */ >>>>> if (is_gimple_call (stmt) >>>>> && gimple_call_internal_p (stmt) >>>>> && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD >>>>> || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) >>>>> return dr_unaligned_supported; >>>>> >>>>> So the worst that would happen is that we'd supposedly peel for >>>>> alignment, but actually misalign everything instead, and so make >>>>> things slower rather than quicker. >>>>> >>>>>> Note that the assert is to prevent bogus information. Iff we aligned >>>>>> DR with base alignment 8 and misalign 3 then if another same-align >>>>>> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT >>>>>> as it still can be 8 after aligning DR. >>>>>> >>>>>> So I think it's wrong to put DRs with differing base-alignment into >>>>>> the same-align-refs chain, those should get their DR_MISALIGNMENT >>>>>> updated independenlty after peeling. >>>>> >>>>> DR_MISALIGNMENT is relative to the vector alignment rather than >>>>> the base alignment though. So: >>>> >>>> We seem to use it that way, yes (looking at set_ptr_info_alignment >>>> uses). So why not fix the assert then by capping the >>>> alignment/misalignment >>>> we compute at this value as well? (and document this in the header >>>> around DR_MISALIGNMENT) >>>> >>>> Ideally we'd do alignment analysis independent of the vector size >>>> though (for those stupid targets with multiple vector sizes to >>>> consider...). >>>> >>>>> a) when looking for references *A1 and *A2 with the same alignment, >>>>> we simply have to prove that A1 % vecalign == A2 % vecalign. >>>>> This doesn't require any knowledge about the base alignment. >>>>> If we break the addresses down as: >>>>> >>>>> A1 = BASE1 + REST1, REST1 = INIT1 + OFFSET1 + X * STEP1 >>>>> A2 = BASE2 + REST2, REST2 = INIT2 + OFFSET2 + X * STEP2 >>>>> >>>>> and can prove that BASE1 == BASE2, the alignment of that base >>>>> isn't important. We simply need to prove that REST1 % vecalign >>>>> == REST2 % vecalign for all X. >>>>> >>>>> b) In the assert, we've peeled the loop so that DR_PEEL is guaranteed >>>>> to be vector-aligned. If DR_PEEL is A1 in the example above, we have >>>>> A1 % vecalign == 0, so A2 % vecalign must be 0 too. This again doesn't >>>>> rely on the base alignment being known. >>>>> >>>>> What a high base alignment for DR_PEEL gives us is the ability to know >>>>> at compile how many iterations need to be peeled to make DR_PEEL aligned. >>>>> But the points above apply regardless of whether we know that value at >>>>> compile time or not. >>>>> >>>>> In examples like the test case, we would have known at compile time that >>>>> VF-1 iterations would need to be peeled if we'd picked the store as the >>>>> DR_PEEL, but would have treated the number of peels as variable if we'd >>>>> picked the load. The value calculated at runtime would still have been >>>>> VF-1, it's just that the code wouldn't have been as efficient. >>>>> >>>>> One of the benefits of pooling the alignments for unconditional references >>>>> is that it no longer matters which DR we pick: the number of peels will >>>>> be a compile-time constant both ways. >>>>> >>>>> Thanks, >>>>> Richard >>>>> >>>>>> I'd rather not mix fixing this with the improvement to eventuall use a >>>>>> larger align for the other DR if possible. >>>> >>>> ^^^ >>>> >>>> So can you fix the ICE with capping base alignment / DR_MISALIGNMENT? >>> >>> I don't think the problem is the lack of a cap. In the test case we >>> see that: >>> >>> 1. B is known at compile time to be X * vecsize + Y when considered in >>> isolation, because the base alignment derived from its DR_REF >= vecsize. >>> So DR_MISALIGNMENT (B) == Y. >>> >>> 2. A's misalignment wrt vecsize is not known at compile time when >>> considered in isolation, because no useful base alignment can be >>> derived from its DR_REF. (The DR_REF is to a plain int rather than >>> to a structure with a high alignment.) So DR_MISALIGNMENT (A) == -1. >>> >>> 3. A and B when considered as a pair trivially have the same misalignment >>> wrt vecsize, for the reasons above. >>> >>> Each of these results is individually correct. The problem is that the >>> assert is conflating two things: it's saying that if we know two datarefs >>> have the same misaligment, we must either be able to calculate a >>> compile-time misalignment for both datarefs in isolation, or we must >>> fail to calculate a compile-time misalignment for both datarefs in >>> isolation. That isn't true: it's valid to have situations in which the >>> compile-time misalignment is known for one dataref in isolation but not >>> for the other. >> >> True. So the assert should then become >> >> gcc_assert (! known_alignment_for_access_p (dr) >> || DR_MISALIGNMENT (dr) / dr_size == >> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >> >> ? > > I think it would need to be: > > gcc_assert (!known_alignment_for_access_p (dr) > || !known_alignment_for_access_p (dr_peel) > || (DR_MISALIGNMENT (dr) / dr_size > == DR_MISALIGNMENT (dr_peel) / dr_peel_size));
I think for !known_alignment_for_access_p (dr_peel) the assert doesn't make any sense (DR_MISALIGNMENT is -1), so yes, you are right. > But yeah, that would work too. The idea with the assert in the patch was > that for unconditional references we probably *do* want to try to compute > the same compile-time misalignment, but for optimisation reasons rather > than correctness. Maybe that's more properly a gcc_checking_assert > though, since nothing goes wrong if it fails. So perhaps: We shouldn't have asserts for optimization reasons, even with checking IMHO. > > gcc_checking_assert (DR_IS_CONDITIONAL_IN_STMT (dr) > || DR_IS_CONDITIONAL_IN_STMT (dr_peel) > || (known_alignment_for_access_p (dr) > == known_alignment_for_access_p (dr_peel))); > > as a follow-on assert. > > Should I split it into two patches, one to change the gcc_assert and > another to add the optimisation? Yes please. Thanks, Richard. > Thanks, > Richard