Richard, I have some comments about the patch. > -ftree-vectorizer-verbose=<number> This switch is deprecated. Use > -fopt-info instead. > > ftree-slp-vectorize > ! Common Report Var(flag_tree_slp_vectorize) Optimization > Enable basic block vectorization (SLP) on trees
The code dealing with the interactions between -ftree-vectorize, O3, etc are complicated and hard to understand. Is it better to change the meaning of -ftree-vectorize to mean -floop-vectorize only, and make it independent of -fslp-vectorize? P > > + fvect-cost-model= > + Common Joined RejectNegative Enum(vect_cost_model) > Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT) > + Specifies the cost model for vectorization > + > + Enum > + Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown > vectorizer cost model %qs) > + > + EnumValue > + Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED) > + > + EnumValue > + Enum(vect_cost_model) String(dynamic) Value(VECT_COST_MODEL_DYNAMIC) > + > + EnumValue > + Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP) Introducing cheap model is a great change. > + > *** 173,179 **** > { > struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > > ! if ((unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) == 0) > return false; > > if (dump_enabled_p ()) > --- 173,180 ---- > { > struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > > ! if (loop_vinfo->cost_model == VECT_COST_MODEL_CHEAP > ! || (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) > == 0) > return false; > When the cost_model == cheap, the alignment peeling should also be disabled -- there will still be loops that are beneficial to be vectorized without peeling -- at perhaps reduced net runtime gain. > struct gimple_opt_pass pass_slp_vectorize = > --- 206,220 ---- > static bool > gate_vect_slp (void) > { > ! /* Apply SLP either according to whether the user specified whether to > ! run SLP or not, or according to whether the user specified whether > ! to do vectorization or not. */ > ! if (global_options_set.x_flag_tree_slp_vectorize) > ! return flag_tree_slp_vectorize != 0; > ! if (global_options_set.x_flag_tree_vectorize) > ! return flag_tree_vectorize != 0; > ! /* And if vectorization was enabled by default run SLP only at -O3. */ > ! return flag_tree_vectorize != 0 && optimize == 3; > } The logic can be greatly simplified if slp vectorizer is controlled independently -- easier for user to understand too. > ! @item -fvect-cost-model=@var{model} > @opindex fvect-cost-model > ! Alter the cost model used for vectorization. The @var{model} argument > ! should be one of @code{unlimited}, @code{dynamic} or @code{cheap}. > ! With the @code{unlimited} model the vectorized code-path is assumed > ! to be profitable while with the @code{dynamic} model a runtime check > ! will guard the vectorized code-path to enable it only for iteration > ! counts that will likely execute faster than when executing the original > ! scalar loop. The @code{cheap} model will disable vectorization of > ! loops where doing so would be cost prohibitive for example due to > ! required runtime checks for data dependence or alignment but otherwise > ! is equal to the @code{dynamic} model. > ! The default cost model depends on other optimization flags and is > ! either @code{dynamic} or @code{cheap}. > Vectorizer in theory will only vectorize a loop with net runtime gain, so the 'cost' here should only mean code size and compile time cost. Cheap Model: with this model, the compiler will vectorize loops that are considered beneficial for runtime performance with minimal code size increase and compile time cost; Unlimited Model: compiler will vectorize loops to maximize runtime gain without considering compile time cost and impact to code size; thanks, David