On Fri, Jun 17, 2016 at 4:33 PM, Ilya Enkovich <[email protected]> wrote:
> 2016-06-16 9:00 GMT+03:00 Jeff Law <[email protected]>:
>> On 05/19/2016 01:39 PM, Ilya Enkovich wrote:
>>>
>>> Hi,
>>>
>>> This patch introduces changes required to run vectorizer on loop epilogue.
>>> This also enables epilogue vectorization using a vector of smaller size.
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> gcc/
>>>
>>> 2016-05-19 Ilya Enkovich <[email protected]>
>>>
>>> * tree-if-conv.c (tree_if_conversion): Make public.
>>> * tree-if-conv.h: New file.
>>> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't
>>> try to enhance alignment for epilogues.
>>> * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
>>> created loop.
>>> * tree-vect-loop.c: include tree-if-conv.h.
>>> (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
>>> loop->aux.
>>> (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
>>> loop->aux.
>>> (vect_analyze_loop): Reset loop->aux.
>>> (vect_transform_loop): Check if created epilogue should be
>>> returned
>>> for further vectorization. If-convert epilogue if required.
>>> * tree-vectorizer.c (vectorize_loops): Add a queue of loops to
>>> process and insert vectorized loop epilogues into this queue.
>>> * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return
>>> created
>>> loop.
>>> (vect_transform_loop): Return created loop.
>>
>> As Richi noted, the additional calls into the if-converter are unfortunate.
>> I'm not sure how else to avoid them though. It looks like we can run
>> if-conversion on just the epilogue, so maybe that's not too bad.
>>
>>
>>> @@ -1212,8 +1213,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo,
>>> bool clean_stmts)
>>> destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
>>> loop_vinfo->scalar_cost_vec.release ();
>>>
>>> + loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
>>> free (loop_vinfo);
>>> - loop->aux = NULL;
>>> }
>>
>> Hmm, there seems to be a level of indirection I'm missing here. We're
>> smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux. Ewww. I thought
>> the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from
>> the original loop to the vectorized epilogue. What am I missing? Rather
>> than smuggling around in the aux field, is there some inherent reason why we
>> can't just copy the info from the original loop directly into
>> LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue?
>
> LOOP_VINFO_ORIG_LOOP_INFO is used for several things:
> - mark this loop as epilogue
> - get VF of original loop (required for both mask and nomask modes)
> - get decision about epilogue masking
>
> That's all. When epilogue is created it has no LOOP_VINFO. Also when we
> vectorize loop we create and destroy its LOOP_VINFO multiple times. When
> loop has LOOP_VINFO loop->aux points to it and original LOOP_VINFO is in
> LOOP_VINFO_ORIG_LOOP_INFO. When Loop has no LOOP_VINFO associated I have no
> place to bind it with the original loop and therefore I use vacant loop->aux
> for that. Any other way to bind epilogue with its original loop would work
> as well. I just chose loop->aux to avoid new fields and data structures.
Maybe simply changing the way the vectorizer iterates over loops like
re-cursing on the generated epilogue and passing down its origin.
>>
>>> + /* FORNOW: Currently alias checks are not inherited for epilogues.
>>> + Don't try to vectorize epilogue because it will require
>>> + additional alias checks. */
>>
>> Are the alias checks here redundant with the ones done for the original
>> loop? If so won't DOM eliminate them?
>
> I revisited this part recently and thought it should actually be safe to
> assume we have no aliasing in epilogue because we are dominated by alias
> checks of the original loop. So I prepared a patch to remove this restriction
> and avoid alias checks generation for epilogues (so we compute aliases checks
> required but don't emit them). I didn't send this patch yet.
> Do you think it is a valid assumption?
>
>>
>>
>> And something just occurred to me -- is there some inherent reason why SLP
>> doesn't vectorize the epilogue, particularly for the cases where we can
>> vectorize the epilogue using smaller vectors? Sorry if you've already
>> answered this somewhere or it's a dumb question.
>
> IIUC this may happen only if we unroll epilogue into a single BB which happens
> only when epilogue iterations count is known. Right?
>
>>
>>
>>
>>>
>>> + /* Add new loop to a processing queue. To make it easier
>>> + to match loop and its epilogue vectorization in dumps
>>> + put new loop as the next loop to process. */
>>> + if (new_loop)
>>> + {
>>> + loops.safe_insert (i + 1, new_loop->num);
>>> + vect_loops_num = number_of_loops (cfun);
>>> + }
>>> +
>>
>> So just to be clear, the only reason to do this is for dumps -- other than
>> processing the loop before it's epilogue, there's no other inherently
>> necessary ordering of the loops, right?
>
> Right, I don't see other reasons to do it.
>
> Thanks,
> Ilya
>
>>
>>
>> Jeff