Richard, I have some comments about the patch.
> -ftree-vectorizer-verbose=<number> This switch is deprecated. Use
> -fopt-info instead.
>
> ftree-slp-vectorize
> ! Common Report Var(flag_tree_slp_vectorize) Optimization
> Enable basic block vectorization (SLP) on trees
The code dealing with the interactions between -ftree-vectorize, O3,
etc are complicated and hard to understand. Is it better to change the
meaning of -ftree-vectorize to mean -floop-vectorize only, and make it
independent of -fslp-vectorize? P
>
> + fvect-cost-model=
> + Common Joined RejectNegative Enum(vect_cost_model)
> Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT)
> + Specifies the cost model for vectorization
> +
> + Enum
> + Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown
> vectorizer cost model %qs)
> +
> + EnumValue
> + Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED)
> +
> + EnumValue
> + Enum(vect_cost_model) String(dynamic) Value(VECT_COST_MODEL_DYNAMIC)
> +
> + EnumValue
> + Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP)
Introducing cheap model is a great change.
> +
> *** 173,179 ****
> {
> struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>
> ! if ((unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) == 0)
> return false;
>
> if (dump_enabled_p ())
> --- 173,180 ----
> {
> struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>
> ! if (loop_vinfo->cost_model == VECT_COST_MODEL_CHEAP
> ! || (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)
> == 0)
> return false;
>
When the cost_model == cheap, the alignment peeling should also be
disabled -- there will still be loops that are beneficial to be
vectorized without peeling -- at perhaps reduced net runtime gain.
> struct gimple_opt_pass pass_slp_vectorize =
> --- 206,220 ----
> static bool
> gate_vect_slp (void)
> {
> ! /* Apply SLP either according to whether the user specified whether to
> ! run SLP or not, or according to whether the user specified whether
> ! to do vectorization or not. */
> ! if (global_options_set.x_flag_tree_slp_vectorize)
> ! return flag_tree_slp_vectorize != 0;
> ! if (global_options_set.x_flag_tree_vectorize)
> ! return flag_tree_vectorize != 0;
> ! /* And if vectorization was enabled by default run SLP only at -O3. */
> ! return flag_tree_vectorize != 0 && optimize == 3;
> }
The logic can be greatly simplified if slp vectorizer is controlled
independently -- easier for user to understand too.
> ! @item -fvect-cost-model=@var{model}
> @opindex fvect-cost-model
> ! Alter the cost model used for vectorization. The @var{model} argument
> ! should be one of @code{unlimited}, @code{dynamic} or @code{cheap}.
> ! With the @code{unlimited} model the vectorized code-path is assumed
> ! to be profitable while with the @code{dynamic} model a runtime check
> ! will guard the vectorized code-path to enable it only for iteration
> ! counts that will likely execute faster than when executing the original
> ! scalar loop. The @code{cheap} model will disable vectorization of
> ! loops where doing so would be cost prohibitive for example due to
> ! required runtime checks for data dependence or alignment but otherwise
> ! is equal to the @code{dynamic} model.
> ! The default cost model depends on other optimization flags and is
> ! either @code{dynamic} or @code{cheap}.
>
Vectorizer in theory will only vectorize a loop with net runtime gain,
so the 'cost' here should only mean code size and compile time cost.
Cheap Model: with this model, the compiler will vectorize loops that
are considered beneficial for runtime performance with minimal code
size increase and compile time cost;
Unlimited Model: compiler will vectorize loops to maximize runtime
gain without considering compile time cost and impact to code size;
thanks,
David