Thanks. I modified the patch so that the max allowed peel iterations
can be specified via a parameter. Testing on going. Ok for trunk ?
David
On Mon, Sep 23, 2013 at 4:31 AM, Richard Biener
<[email protected]> wrote:
> On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li <[email protected]>
> wrote:
>> On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
>> <[email protected]> wrote:
>>> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <[email protected]>
>>> wrote:
>>>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
>>>> <[email protected]> wrote:
>>>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <[email protected]>
>>>>> wrote:
>>>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
>>>>>> <[email protected]> wrote:
>>>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li
>>>>>>> <[email protected]> wrote:
>>>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>>>>>>> but there is no simple way to turn on loop vectorization alone. The
>>>>>>>> logic for default O3 setting is also complicated.
>>>>>>>>
>>>>>>>> In this patch, two new options are introduced:
>>>>>>>>
>>>>>>>> 1) -ftree-loop-vectorize
>>>>>>>>
>>>>>>>> This option is used to turn on loop vectorization only. option
>>>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>>>>>>> business of Init(2) is needed. With this change, -ftree-vectorize
>>>>>>>> becomes a simple alias to -ftree-loop-vectorize +
>>>>>>>> -ftree-slp-vectorize.
>>>>>>>>
>>>>>>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>>>>>>
>>>>>>>> -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>>>>>>
>>>>>>>> With the new change it becomes:
>>>>>>>>
>>>>>>>> -O3 -fno-loop-vectorize
>>>>>>>>
>>>>>>>>
>>>>>>>> To turn on only loop vectorize at O2, the old way is
>>>>>>>>
>>>>>>>> -O2 -ftree-vectorize -fno-slp-vectorize
>>>>>>>>
>>>>>>>> The new way is
>>>>>>>>
>>>>>>>> -O2 -ftree-loop-vectorize
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2) -ftree-vect-loop-peeling
>>>>>>>>
>>>>>>>> This option is used to turn on/off loop peeling for alignment. In the
>>>>>>>> long run, this should be folded into the cheap cost model proposed by
>>>>>>>> Richard. This option is also useful in scenarios where peeling can
>>>>>>>> introduce runtime problems:
>>>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html which happens to be
>>>>>>>> common in practice.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Patch attached. Compiler boostrapped. Ok after testing?
>>>>>>>
>>>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not
>>>>>>> on 2).
>>>>>>
>>>>>> Ok. Can you also comment on 2) ?
>>>>>
>>>>> I think we want to decide how granular we want to control the vectorizer
>>>>> and using which mechanism. My cost-model re-org makes
>>>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
>>>>> a step backwards in this context.
>>>>
>>>> Using cost model to do a coarse grain control/configuration is
>>>> certainly something we want, but having a fine grain control is still
>>>> useful.
>>>>
>>>>>
>>>>> So, can you summarize what pieces (including versioning) of the vectorizer
>>>>> you'd want to be able to disable separately?
>>>>
>>>> Loop peeling seems to be the main one. There is also a correctness
>>>> issue related. For instance, the following code is common in practice,
>>>> but loop peeling wrongly assumes initial base-alignment and generates
>>>> aligned mov instruction after peeling, leading to SEGV. Peeling is
>>>> not something we can blindly turned on -- even when it is on, there
>>>> should be a way to turn it off explicitly:
>>>>
>>>> char a[10000];
>>>>
>>>> void foo(int n)
>>>> {
>>>> int* b = (int*)(a+n);
>>>> int i = 0;
>>>> for (; i < 1000; ++i)
>>>> b[i] = 1;
>>>> }
>>>>
>>>> int main(int argn, char** argv)
>>>> {
>>>> foo(argn);
>>>> }
>>>
>>> But that's just a bug that should be fixed (looking into it).
>>>
>>>>> Just disabling peeling for
>>>>> alignment may get you into the versioning for alignment path (and thus
>>>>> an unvectorized loop at runtime).
>>>>
>>>> This is not true for target supporting mis-aligned access. I have not
>>>> seen a case where alignment driver loop version happens on x86.
>>>>
>>>>>Also it's know that the alignment peeling
>>>>> code needs some serious TLC (it's outcome depends on the order of DRs,
>>>>> the cost model it uses leaves to be desired as we cannot distinguish
>>>>> between unaligned load and store costs).
>>>>
>>>> Yet another reason to turn it off as it is not effective anyways?
>>>
>>> As said I'll disable all remains of -ftree-vect-loop-version with the cost
>>> model
>>> patch because it wasn't guarding versioning for aliasing but only versioning
>>> for alignment.
>>>
>>> We have to be consistent here - if we add a way to disable peeling for
>>> alignment then we certainly don't want to remove the ability to disable
>>> versioning for alignment, no?
>>
>> We already have the ability to turn off versioning -- via --param. It
>> is a more natural way to fine tune a pass instead of introducing a -f
>> option. For this reason, your planned deprecation of the option is a
>> good thing to do.
>>
>> For consistency, I think we should introduce a new parameter to turn
>> on/off peeling. This can also be tied closely with the cost model
>> change.
>>
>> The proposed patch attached. Does this one look ok?
>
> I'd principally support adding a param but rather would for example make
> it limit the number of peeled iterations (zero turning off the feature). As
> you implemented it it will give false messages for
>
> supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
> do_peeling = vector_alignment_reachable_p (dr);
> if (do_peeling)
> {
> ...
> }
> else
> {
> if (!aligned_access_p (dr))
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> "vector alignment may not be reachable\n");
> break;
> }
> }
>
> the else path. I'd restructure this as
>
> if (!vector_alignment_reachable_p (dr))
> {
> if (!aligned_access_p (dr))
> ....
> continue;
> }
>
> if (known_alignment_for_access_p (dr))
> {
> ... check param against the number of iterations to peel for this DR ....
> }
> else
> {
> ... check param against max iterations ...
> }
>
> an alternative is a tri-state param that would either allow no peeling at all,
> just peeling with known number of iterations or all kind of peelings.
>
> Thanks,
> Richard.
>
>
>> thanks,
>>
>> David
>>
>>
>>
>>>
>>> Richard.
>>>
>>>>
>>>> thanks,
>>>>
>>>> David
>>>>
>>>>>
>>>>> Richard.
>>>>>
>>>>>>>
>>>>>>> I've stopped a quick try doing 1) myself because
>>>>>>>
>>>>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>>>>>> opts->x_flag_ipa_reference = false;
>>>>>>> break;
>>>>>>>
>>>>>>> + case OPT_ftree_vectorize:
>>>>>>> + if (!opts_set->x_flag_tree_loop_vectorize)
>>>>>>> + opts->x_flag_tree_loop_vectorize = value;
>>>>>>> + if (!opts_set->x_flag_tree_slp_vectorize)
>>>>>>> + opts->x_flag_tree_slp_vectorize = value;
>>>>>>> + break;
>>>>>>>
>>>>>>> doesn't look obviously correct. Does that handle
>>>>>>>
>>>>>>> -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>>>>>>
>>>>>>> or
>>>>>>>
>>>>>>> -ftree-loop-vectorize -fno-tree-vectorize
>>>>>>>
>>>>>>> properly? Currently at least
>>>>>>>
>>>>>>> -ftree-slp-vectorize -fno-tree-vectorize
>>>>>>>
>>>>>>> doesn't "work".
>>>>>>
>>>>>>
>>>>>> Right -- same is true for -fprofile-use option. FDO enables some
>>>>>> passes, but can not re-enable them if they are flipped off before.
>>>>>>
>>>>>>>
>>>>>>> That said, the option machinery doesn't handle an option being an alias
>>>>>>> for two other options, so it's mechanism to contract positives/negatives
>>>>>>> doesn't work here and the override hooks do not work reliably for
>>>>>>> repeated options.
>>>>>>>
>>>>>>> Or am I wrong here? Should we care at all? Joseph?
>>>>>>
>>>>>> We should probably just document the behavior. Even better, we should
>>>>>> deprecate the old option.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> David
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> David
Index: params.def
===================================================================
--- params.def (revision 202834)
+++ params.def (working copy)
@@ -544,6 +544,11 @@ DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_ALIA
"Bound on number of runtime checks inserted by the vectorizer's loop
versioning for alias check",
10, 0, 0)
+DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
+ "vect-max-peeling-for-alignment",
+ "Max number of loop peels to enhancement alignment of data references
in a loop",
+ -1, 0, 0)
+
DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS,
"max-cselib-memory-locations",
"The maximum memory locations recorded by cselib",
Index: tree-vect-data-refs.c
===================================================================
--- tree-vect-data-refs.c (revision 202834)
+++ tree-vect-data-refs.c (working copy)
@@ -1718,6 +1718,30 @@ vect_enhance_data_refs_alignment (loop_v
if (do_peeling)
{
+ unsigned max_allowed_peel
+ = PARAM_VALUE (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT);
+ if (max_allowed_peel != (unsigned)-1)
+ {
+ unsigned max_peel = npeel;
+ if (max_peel == 0)
+ {
+ gimple dr_stmt = DR_STMT (dr0);
+ stmt_vec_info vinfo = vinfo_for_stmt (dr_stmt);
+ tree vtype = STMT_VINFO_VECTYPE (vinfo);
+ max_peel = TYPE_VECTOR_SUBPARTS (vtype) - 1;
+ }
+ if (max_peel > max_allowed_peel)
+ {
+ do_peeling = false;
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Disable peeling, max peels reached: %d\n", max_peel);
+ }
+ }
+ }
+
+ if (do_peeling)
+ {
stmt_info_for_cost *si;
void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 202834)
+++ doc/invoke.texi (working copy)
@@ -9451,6 +9451,10 @@ The maximum number of run-time checks th
doing loop versioning for alias in the vectorizer. See option
@option{-ftree-vect-loop-version} for more information.
+@item vect-max-peeling-for-alignment
+The maximum number of loop peels to enhance access alignment
+for vectorizer. Value -1 means 'no limit'.
+
@item max-iterations-to-track
The maximum number of iterations of a loop the brute-force algorithm
for analysis of the number of iterations of the loop tries to evaluate.