For Transactional Memory support, we also create versions of functions (see code in trunk, e.g., in trans-mem.c). Right now, that's a single instrumented version of the original code but having different transactional instrumentations available might be worthwhile in the future.
Is there a chance that we can share any code between TM and what you are working on? Torvald On Thu, 2011-12-15 at 22:53 -0800, Sriraman Tallam wrote: > Hi, > > I am working on user-directed and compiler-directed function > multiversioning which has been discussed in these threads: > > 1) http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02344.html > 2) http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html > > The gist of the discussions for user-directed multiversioning is that > it should use a function overloading mechanism to allow the user to > specify multiple versions of the same function and/or use a function > attribute to specify that a particular function must be mutiversioned. > Afterwards, a call to such a function is appropriately dispatched by > the compiler. This work is in progress. However, this patch is *not* > about this. > > This patch does compiler directed multi-versioning which is to allow > the compiler to automatically version functions in order to > exploit uArch features to maxmize performance for a set of selected > target platforms in the same binary. I have added a new flag, mvarch, > to allow the user to specify the arches on which the generated > binary will be running on. > More than one arch name is allowed, for instance, -mvarch=core2, > corei7 (arch names same as that allowed by march). The compiler will > then automatically create function versions that are specialized for > these arches by tagging "-mtune=<arch>" on the versions. It will only > create versions of those functions where it sees opportunities for > performance improvement. > > As a use case, I have added versioning for core2 where the function > will be optimized for vectorization. I submitted a patch recently to > not allow vectorization of loops in core2 if unaligned vector > load/stores are generated as these are very slow : > http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00955.html > With mvarch=core2, the compiler will identify functions with > vectorizable loads/stores in loops and create a version for core2. The > core2 version will not have unaligned vector load/stores whereas the > default will. > > It is also easy to add versioning criteria for other arches or add more > versioning criteria for core2. I already experimented with one other > versioning criterion for corei7 and I plan to add that in a follow-up > patch. Basically, any mtune specific optimizations can be plugged in > as a criterion to version. For this patch, only the vectorization based > versioning is shown. > > The version dispatch happens via the IFUNC mechanism to keep the > run-time overhead of version dispatch minimal. When the compiler has > to version a function foo, it makes two copies, foo.autoclone.original > and foo.autoclone.clone0. It then modifies foo by replacing it with a > ifunc call to these two versions based on the outcome of a run-time > check for the processor type. > > The function cloning for preventing vectorization on core2 is done > aggressively as it conservatively checks if a particular function can > generate unaligned accesses. This is necessary as the cloning pass > happens early whereas the actual vectorization happens much later in > the loop optimization passes. Hence, the vectorization pass sees > a different IR and the same checks to detect unaligned accesses cannot > be reused here. So, it could turn out that the final code generated > in all the function versions is identical. I am working on solutions > to this problem but there is the ICF feature in the gold linker which > can detect identical function bodies and merge them. Note that this > need to be true always for other optimizations if it possible to > detect with high accuracy if a function will benefit or not from > versioning. > > Regarding the placement of the versioning pass in the pass order, it > comes after inlining otherwise the ifunc calls would prevent inlining > of the functions. Also, it was desired that all versioning decisions > happen at one point and it should happen before any target specific > optimizations kick in. Hence, it was chosen to come just after the > ipa-inline pass. > > This optimization was tested on one of our image processing related > benchmarks and the text size bloat from auto-versioning was about 20%. > The performance improvement from tree vectorization was ~22% on corei7, > 10% on AMD istanbul and ~2% on core2. Without this versioning, tree > vectorization was deteriorating the performance by ~6%. > > > * mversn-dispatch.c (make_name): Use '.' to concatenate to suffix > mangled names. > (clone_function): Do not make clones ctors/dtors. Recompute dominance > info. > (make_bb_flow): New function. > (get_selector_gimple_seq): New function. > (make_selector_function): New function. > (make_attribute): New function. > (make_ifunc_function): New function. > (copy_decl_attributes): New function. > (dispatch_using_ifunc): New function. > (purge_function_body): New function. > (function_can_make_abnormal_goto): New function. > (make_function_not_cloneable): New function. > (do_auto_clone): New function. > (pass_auto_clone): New gimple pass. > * passes.c (init_optimization_passes): Add pass_auto_clone to list. > * tree-pass.h (pass_auto_clone): New pass. > * params.def (PARAM_MAX_FUNCTION_SIZE_FOR_AUTO_CLONING): New param. > * target.def (mversion_function): New target hook. > * config/i386/i386.c (ix86_option_override_internal): Check correctness > of ix86_mv_arch_string. > (add_condition_to_bb): New function. > (make_empty_function): New function. > (make_condition_function): New function. > (is_loop_form_vectorizable): New function. > (is_loop_stmts_vectorizable): New function. > (any_loops_vectorizable_with_load_store): New function. > (mversion_for_core2): New function. > (ix86_mversion_function): New function. > * config/i386/i386.opt (mvarch): New option. > * doc/tm.texi (TARGET_MVERSION_FUNCTION): Document. > * doc/tm.texi.in (TARGET_MVERSION_FUNCTION): Document. > * testsuite/gcc.dg/automversn_1.c: New testcase. > > Index: doc/tm.texi > =================================================================== > --- doc/tm.texi (revision 182355) > +++ doc/tm.texi (working copy) > @@ -10927,6 +10927,11 @@ The result is another tree containing a simplified > call's result. If @var{ignore} is true the value will be ignored. > @end deftypefn > > +@deftypefn {Target Hook} int TARGET_MVERSION_FUNCTION (tree @var{fndecl}, > tree *@var{optimization_node_chain}, tree *@var{cond_func_decl}) > +Check if a function needs to be multi-versioned to support variants of > +this architecture. @var{fndecl} is the declaration of the function. > +@end deftypefn > + > @deftypefn {Target Hook} bool TARGET_SLOW_UNALIGNED_VECTOR_MEMOP (void) > Return true if unaligned vector memory load/store is a slow operation > on this target. > Index: doc/tm.texi.in > =================================================================== > --- doc/tm.texi.in (revision 182355) > +++ doc/tm.texi.in (working copy) > @@ -10873,6 +10873,11 @@ The result is another tree containing a simplified > call's result. If @var{ignore} is true the value will be ignored. > @end deftypefn > > +@hook TARGET_MVERSION_FUNCTION > +Check if a function needs to be multi-versioned to support variants of > +this architecture. @var{fndecl} is the declaration of the function. > +@end deftypefn > + > @hook TARGET_SLOW_UNALIGNED_VECTOR_MEMOP > Return true if unaligned vector memory load/store is a slow operation > on this target. > Index: target.def > =================================================================== > --- target.def (revision 182355) > +++ target.def (working copy) > @@ -1277,6 +1277,12 @@ DEFHOOK > "", > bool, (void), NULL) > > +/* Target hook to check if this function should be versioned. */ > +DEFHOOK > +(mversion_function, > + "", > + int, (tree fndecl, tree *optimization_node_chain, tree *cond_func_decl), > NULL) > + > /* Returns a code for a target-specific builtin that implements > reciprocal of the function, or NULL_TREE if not available. */ > DEFHOOK > Index: tree-pass.h > =================================================================== > --- tree-pass.h (revision 182355) > +++ tree-pass.h (working copy) > @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_split_functions > extern struct gimple_opt_pass pass_feedback_split_functions; > extern struct gimple_opt_pass pass_threadsafe_analyze; > extern struct gimple_opt_pass pass_tree_convert_builtin_dispatch; > +extern struct gimple_opt_pass pass_auto_clone; > > /* IPA Passes */ > extern struct simple_ipa_opt_pass pass_ipa_lower_emutls; > Index: testsuite/gcc.dg/automversn_1.c > =================================================================== > --- testsuite/gcc.dg/automversn_1.c (revision 0) > +++ testsuite/gcc.dg/automversn_1.c (revision 0) > @@ -0,0 +1,27 @@ > +/* Check that the auto_clone pass works correctly. Function foo must be > cloned > + because it is hot and has a vectorizable store. */ > + > +/* { dg-options "-O2 -ftree-vectorize -mvarch=core2 -fdump-tree-auto_clone" > } */ > +/* { dg-do run } */ > + > +char a[16]; > + > +int __attribute__ ((hot)) __attribute__ ((noinline)) > +foo (void) > +{ > + int i; > + for (i = 0; i< 16; i++) > + a[i] = 0; > + return 0; > +} > + > +int > +main () > +{ > + return foo (); > +} > + > + > +/* { dg-final { scan-tree-dump "foo\.autoclone\.original" "auto_clone" } } */ > +/* { dg-final { scan-tree-dump "foo\.autoclone\.0" "auto_clone" } } */ > +/* { dg-final { cleanup-tree-dump "auto_clone" } } */ > Index: mversn-dispatch.c > =================================================================== > --- mversn-dispatch.c (revision 182355) > +++ mversn-dispatch.c (working copy) > @@ -135,6 +135,8 @@ along with GCC; see the file COPYING3. If not see > #include "output.h" > #include "vecprim.h" > #include "gimple-pretty-print.h" > +#include "target.h" > +#include "cfgloop.h" > > typedef struct cgraph_node* NODEPTR; > DEF_VEC_P (NODEPTR); > @@ -212,8 +214,7 @@ function_args_count (tree fntype) > return num; > } > > -/* Return the variable name (global/constructor) to use for the > - version_selector function with name of DECL by appending SUFFIX. */ > +/* Return a new name by appending SUFFIX to the DECL name. */ > > static char * > make_name (tree decl, const char *suffix) > @@ -226,7 +227,8 @@ make_name (tree decl, const char *suffix) > > name_len = strlen (name) + strlen (suffix) + 2; > global_var_name = (char *) xmalloc (name_len); > - snprintf (global_var_name, name_len, "%s_%s", name, suffix); > + /* Use '.' to concatenate names as it is demangler friendly. */ > + snprintf (global_var_name, name_len, "%s.%s", name, suffix); > return global_var_name; > } > > @@ -246,9 +248,9 @@ static char* > make_feature_test_global_name (tree decl, bool is_constructor) > { > if (is_constructor) > - return make_name (decl, "version_selector_constructor"); > + return make_name (decl, "version.selector.constructor"); > > - return make_name (decl, "version_selector_global"); > + return make_name (decl, "version.selector.global"); > } > > /* This function creates a new VAR_DECL with attributes set > @@ -865,6 +867,9 @@ empty_function_body (tree fndecl) > e = make_edge (new_bb, EXIT_BLOCK_PTR, 0); > gcc_assert (e != NULL); > > + if (dump_file) > + dump_function_to_file (current_function_decl, dump_file, TDF_BLOCKS); > + > current_function_decl = old_current_function_decl; > pop_cfun (); > return new_bb; > @@ -921,6 +926,10 @@ clone_function (tree orig_fndecl, const char *name > push_cfun (DECL_STRUCT_FUNCTION (new_decl)); > current_function_decl = new_decl; > > + /* The clones should not be ctors or dtors. */ > + DECL_STATIC_CONSTRUCTOR (new_decl) = 0; > + DECL_STATIC_DESTRUCTOR (new_decl) = 0; > + > TREE_READONLY (new_decl) = TREE_READONLY (orig_fndecl); > TREE_STATIC (new_decl) = TREE_STATIC (orig_fndecl); > TREE_USED (new_decl) = TREE_USED (orig_fndecl); > @@ -954,6 +963,12 @@ clone_function (tree orig_fndecl, const char *name > cgraph_call_function_insertion_hooks (new_version); > cgraph_mark_needed_node (new_version); > > + > + free_dominance_info (CDI_DOMINATORS); > + free_dominance_info (CDI_POST_DOMINATORS); > + calculate_dominance_info (CDI_DOMINATORS); > + calculate_dominance_info (CDI_POST_DOMINATORS); > + > pop_cfun (); > current_function_decl = old_current_function_decl; > > @@ -1034,9 +1049,9 @@ make_specialized_call_to_clone (gimple generic_stm > gcc_assert (generic_fndecl != NULL); > > if (side == 0) > - new_name = make_name (generic_fndecl, "clone_0"); > + new_name = make_name (generic_fndecl, "clone.0"); > else > - new_name = make_name (generic_fndecl, "clone_1"); > + new_name = make_name (generic_fndecl, "clone.1"); > > slot = htab_find_slot_with_hash (name_decl_htab, new_name, > htab_hash_string (new_name), NO_INSERT); > @@ -1764,3 +1779,700 @@ struct gimple_opt_pass pass_tree_convert_builtin_d > TODO_update_ssa | TODO_verify_ssa > } > }; > + > +/* This function generates gimple code in NEW_BB to check if COND_VAR > + is equal to WHICH_VERSION and return FN_VER pointer if it is equal. > + The basic block returned is the block where the control flows if > + the equality is false. */ > + > +static basic_block > +make_bb_flow (basic_block new_bb, tree cond_var, tree fn_ver, > + int which_version, tree bindings) > +{ > + tree result_var; > + tree convert_expr; > + > + basic_block bb1, bb2, bb3; > + edge e12, e23; > + > + gimple if_else_stmt; > + gimple if_stmt; > + gimple return_stmt; > + gimple_seq gseq = bb_seq (new_bb); > + > + /* Check if the value of cond_var is equal to which_version. */ > + if_else_stmt = gimple_build_cond (EQ_EXPR, cond_var, > + build_int_cst (NULL, which_version), > + NULL_TREE, NULL_TREE); > + > + mark_symbols_for_renaming (if_else_stmt); > + gimple_seq_add_stmt (&gseq, if_else_stmt); > + gimple_set_block (if_else_stmt, bindings); > + gimple_set_bb (if_else_stmt, new_bb); > + > + result_var = create_tmp_var (ptr_type_node, NULL); > + add_referenced_var (result_var); > + > + convert_expr = build1 (CONVERT_EXPR, ptr_type_node, fn_ver); > + if_stmt = gimple_build_assign (result_var, convert_expr); > + mark_symbols_for_renaming (if_stmt); > + gimple_seq_add_stmt (&gseq, if_stmt); > + gimple_set_block (if_stmt, bindings); > + > + return_stmt = gimple_build_return (result_var); > + mark_symbols_for_renaming (return_stmt); > + gimple_seq_add_stmt (&gseq, return_stmt); > + > + set_bb_seq (new_bb, gseq); > + > + bb1 = new_bb; > + e12 = split_block (bb1, if_else_stmt); > + bb2 = e12->dest; > + e12->flags &= ~EDGE_FALLTHRU; > + e12->flags |= EDGE_TRUE_VALUE; > + > + e23 = split_block (bb2, return_stmt); > + gimple_set_bb (if_stmt, bb2); > + gimple_set_bb (return_stmt, bb2); > + bb3 = e23->dest; > + make_edge (bb1, bb3, EDGE_FALSE_VALUE); > + > + remove_edge (e23); > + make_edge (bb2, EXIT_BLOCK_PTR, 0); > + > + return bb3; > +} > + > +/* Given the pointer to the condition function COND_FUNC_ARG, whose return > + value decides the version that gets executed, and the pointers to the > + function versions, FN_VER_LIST, this function generates control-flow to > + return the appropriate function version pointer based on the return value > + of the conditional function. The condition function is assumed to return > + values 0, 1, 2, ... */ > + > +static gimple_seq > +get_selector_gimple_seq (tree cond_func_arg, tree fn_ver_list, tree > default_ver, > + basic_block new_bb, tree bindings) > +{ > + basic_block final_bb; > + > + gimple return_stmt, default_stmt; > + gimple_seq gseq = NULL; > + gimple_seq gseq_final = NULL; > + gimple call_cond_stmt; > + > + tree result_var; > + tree convert_expr; > + tree p; > + tree cond_var; > + > + int which_version; > + > + /* Call the condition function once and store the outcome in cond_var. */ > + cond_var = create_tmp_var (integer_type_node, NULL); > + call_cond_stmt = gimple_build_call (cond_func_arg, 0); > + gimple_call_set_lhs (call_cond_stmt, cond_var); > + add_referenced_var (cond_var); > + mark_symbols_for_renaming (call_cond_stmt); > + > + gimple_seq_add_stmt (&gseq, call_cond_stmt); > + gimple_set_block (call_cond_stmt, bindings); > + gimple_set_bb (call_cond_stmt, new_bb); > + > + set_bb_seq (new_bb, gseq); > + > + final_bb = new_bb; > + > + which_version = 0; > + for (p = fn_ver_list; p != NULL_TREE; p = TREE_CHAIN (p)) > + { > + tree ver = TREE_PURPOSE (p); > + /* Return this version's pointer, VER, if the value returned by the > + condition funciton is equal to WHICH_VERSION. */ > + final_bb = make_bb_flow (final_bb, cond_var, ver, which_version, > + bindings); > + which_version++; > + } > + > + result_var = create_tmp_var (ptr_type_node, NULL); > + add_referenced_var (result_var); > + > + /* Return the default version function pointer as the default. */ > + convert_expr = build1 (CONVERT_EXPR, ptr_type_node, default_ver); > + default_stmt = gimple_build_assign (result_var, convert_expr); > + mark_symbols_for_renaming (default_stmt); > + gimple_seq_add_stmt (&gseq_final, default_stmt); > + gimple_set_block (default_stmt, bindings); > + gimple_set_bb (default_stmt, final_bb); > + > + return_stmt = gimple_build_return (result_var); > + mark_symbols_for_renaming (return_stmt); > + gimple_seq_add_stmt (&gseq_final, return_stmt); > + gimple_set_bb (return_stmt, final_bb); > + > + set_bb_seq (final_bb, gseq_final); > + > + return gseq; > +} > + > +/* Make the ifunc selector function which calls function pointed to by > + COND_FUNC_ARG and checks the value to return the appropriate function > + version pointer. */ > + > +static tree > +make_selector_function (const char *name, tree cond_func_arg, > + tree fn_ver_list, tree default_ver) > +{ > + tree decl, type, t; > + basic_block new_bb; > + tree old_current_function_decl; > + tree decl_name; > + > + /* The selector function should return a (void *). */ > + type = build_function_type_list (ptr_type_node, NULL_TREE); > + > + decl = build_fn_decl (name, type); > + > + decl_name = get_identifier (name); > + SET_DECL_ASSEMBLER_NAME (decl, decl_name); > + DECL_NAME (decl) = decl_name; > + gcc_assert (cgraph_node (decl) != NULL); > + > + TREE_USED (decl) = 1; > + DECL_ARTIFICIAL (decl) = 1; > + DECL_IGNORED_P (decl) = 0; > + TREE_PUBLIC (decl) = 0; > + DECL_UNINLINABLE (decl) = 1; > + DECL_EXTERNAL (decl) = 0; > + DECL_CONTEXT (decl) = NULL_TREE; > + DECL_INITIAL (decl) = make_node (BLOCK); > + DECL_STATIC_CONSTRUCTOR (decl) = 0; > + TREE_READONLY (decl) = 0; > + DECL_PURE_P (decl) = 0; > + > + /* Build result decl and add to function_decl. */ > + t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); > + DECL_ARTIFICIAL (t) = 1; > + DECL_IGNORED_P (t) = 1; > + DECL_RESULT (decl) = t; > + > + gimplify_function_tree (decl); > + > + old_current_function_decl = current_function_decl; > + push_cfun (DECL_STRUCT_FUNCTION (decl)); > + current_function_decl = decl; > + init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl)); > + > + cfun->curr_properties |= > + (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars | > + PROP_ssa); > + > + new_bb = create_empty_bb (ENTRY_BLOCK_PTR); > + make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); > + make_edge (new_bb, EXIT_BLOCK_PTR, 0); > + > + /* This call is very important if this pass runs when the IR is in > + SSA form. It breaks things in strange ways otherwise. */ > + init_tree_ssa (DECL_STRUCT_FUNCTION (decl)); > + init_ssa_operands (); > + > + /* Make the body of thr selector function. */ > + get_selector_gimple_seq (cond_func_arg, fn_ver_list, default_ver, new_bb, > + DECL_INITIAL (decl)); > + > + cgraph_add_new_function (decl, true); > + cgraph_call_function_insertion_hooks (cgraph_node (decl)); > + cgraph_mark_needed_node (cgraph_node (decl)); > + > + if (dump_file) > + dump_function_to_file (decl, dump_file, TDF_BLOCKS); > + > + pop_cfun (); > + current_function_decl = old_current_function_decl; > + return decl; > +} > + > +/* Makes a function attribute of the form NAME(ARG_NAME) and chains > + it to CHAIN. */ > + > +static tree > +make_attribute (const char *name, const char *arg_name, tree chain) > +{ > + tree attr_name; > + tree attr_arg_name; > + tree attr_args; > + tree attr; > + > + attr_name = get_identifier (name); > + attr_arg_name = build_string (strlen (arg_name), arg_name); > + attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE); > + attr = tree_cons (attr_name, attr_args, chain); > + return attr; > +} > + > +/* This creates the ifunc function IFUNC_NAME whose selector function is > + SELECTOR_NAME. */ > + > +static tree > +make_ifunc_function (const char* ifunc_name, const char *selector_name, > + tree fn_type) > +{ > + tree type; > + tree decl; > + > + /* The signature of the ifunc function is set to the > + type of any version. */ > + type = build_function_type (TREE_TYPE (fn_type), TYPE_ARG_TYPES (fn_type)); > + decl = build_fn_decl (ifunc_name, type); > + > + DECL_CONTEXT (decl) = NULL_TREE; > + DECL_INITIAL (decl) = error_mark_node; > + > + /* Set ifunc attribute */ > + DECL_ATTRIBUTES (decl) > + = make_attribute ("ifunc", selector_name, DECL_ATTRIBUTES (decl)); > + > + assemble_alias (decl, get_identifier (selector_name)); > + > + return decl; > +} > + > +/* Copy the decl attributes from from_decl to to_decl, except > + DECL_ARTIFICIAL and TREE_PUBLIC. */ > + > +static void > +copy_decl_attributes (tree to_decl, tree from_decl) > +{ > + TREE_READONLY (to_decl) = TREE_READONLY (from_decl); > + TREE_USED (to_decl) = TREE_USED (from_decl); > + DECL_ARTIFICIAL (to_decl) = 1; > + DECL_IGNORED_P (to_decl) = DECL_IGNORED_P (from_decl); > + TREE_PUBLIC (to_decl) = 0; > + DECL_CONTEXT (to_decl) = DECL_CONTEXT (from_decl); > + DECL_EXTERNAL (to_decl) = DECL_EXTERNAL (from_decl); > + DECL_COMDAT (to_decl) = DECL_COMDAT (from_decl); > + DECL_COMDAT_GROUP (to_decl) = DECL_COMDAT_GROUP (from_decl); > + DECL_VIRTUAL_P (to_decl) = DECL_VIRTUAL_P (from_decl); > + DECL_WEAK (to_decl) = DECL_WEAK (from_decl); > +} > + > +/* This function does the mult-version run-time dispatch using IFUNC. Given > + NUM_VERSIONS versions of a function with the decls in FN_VER_LIST along > + with a default version in DEFAULT_VER. Also given is a condition > function, > + COND_FUNC_ADDR, whose return value decides the version that gets executed. > + This function generates the necessary code to dispatch the right function > + version and returns this a GIMPLE_SEQ. The decls of the ifunc function and > + the selector function that are created are stored in IFUNC_DECL and > + SELECTOR_DECL. */ > + > +static gimple_seq > +dispatch_using_ifunc (int num_versions, tree orig_func_decl, > + tree cond_func_addr, tree fn_ver_list, > + tree default_ver, tree *selector_decl, > + tree *ifunc_decl) > +{ > + char *selector_name; > + char *ifunc_name; > + tree ifunc_function; > + tree selector_function; > + tree return_type; > + VEC (tree, heap) *nargs = NULL; > + tree arg; > + gimple ifunc_call_stmt; > + gimple return_stmt; > + gimple_seq gseq = NULL; > + > + gcc_assert (cond_func_addr != NULL > + && num_versions > 0 > + && orig_func_decl != NULL > + && fn_ver_list != NULL); > + > + /* The return type of any function version. */ > + return_type = TREE_TYPE (TREE_TYPE (orig_func_decl)); > + > + nargs = VEC_alloc (tree, heap, 4); > + > + for (arg = DECL_ARGUMENTS (orig_func_decl); > + arg; arg = TREE_CHAIN (arg)) > + { > + VEC_safe_push (tree, heap, nargs, arg); > + add_referenced_var (arg); > + } > + > + /* Assign names to ifunc and ifunc_selector functions. */ > + selector_name = make_name (orig_func_decl, "ifunc.selector"); > + ifunc_name = make_name (orig_func_decl, "ifunc"); > + > + /* Make a selector function which returns the appropriate function > + version pointer based on the outcome of the condition function > + execution. */ > + selector_function = make_selector_function (selector_name, cond_func_addr, > + fn_ver_list, default_ver); > + *selector_decl = selector_function; > + > + /* Make a new ifunc function. */ > + ifunc_function = make_ifunc_function (ifunc_name, selector_name, > + TREE_TYPE (orig_func_decl)); > + *ifunc_decl = ifunc_function; > + > + /* Make selector and ifunc shadow the attributes of the original function. > */ > + copy_decl_attributes (ifunc_function, orig_func_decl); > + copy_decl_attributes (selector_function, orig_func_decl); > + > + ifunc_call_stmt = gimple_build_call_vec (ifunc_function, nargs); > + gimple_seq_add_stmt (&gseq, ifunc_call_stmt); > + > + /* Make function return the value of it is a non-void type. */ > + if (TREE_CODE (return_type) != VOID_TYPE) > + { > + tree lhs_var; > + tree lhs_var_ssa_name; > + tree result_decl; > + > + result_decl = DECL_RESULT (orig_func_decl); > + > + if (result_decl > + && aggregate_value_p (result_decl, orig_func_decl) > + && !TREE_ADDRESSABLE (result_decl)) > + { > + /* Build a RESULT_DECL rather than a VAR_DECL for this case. > + See tree-nrv.c: tree_nrv. It checks if the DECL_RESULT and the > + return value are the same. */ > + lhs_var = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL, > + return_type); > + DECL_ARTIFICIAL (lhs_var) = 1; > + DECL_IGNORED_P (lhs_var) = 1; > + TREE_READONLY (lhs_var) = 0; > + DECL_EXTERNAL (lhs_var) = 0; > + TREE_STATIC (lhs_var) = 0; > + TREE_USED (lhs_var) = 1; > + > + add_referenced_var (lhs_var); > + DECL_RESULT (orig_func_decl) = lhs_var; > + } > + else if (!TREE_ADDRESSABLE (return_type) > + && COMPLETE_TYPE_P (return_type)) > + { > + lhs_var = create_tmp_var (return_type, NULL); > + add_referenced_var (lhs_var); > + } > + else > + { > + lhs_var = create_tmp_var_raw (return_type, NULL); > + TREE_ADDRESSABLE (lhs_var) = 1; > + gimple_add_tmp_var (lhs_var); > + add_referenced_var (lhs_var); > + } > + > + if (AGGREGATE_TYPE_P (return_type) > + || TREE_CODE (return_type) == COMPLEX_TYPE) > + { > + gimple_call_set_lhs (ifunc_call_stmt, lhs_var); > + return_stmt = gimple_build_return (lhs_var); > + } > + else > + { > + lhs_var_ssa_name = make_ssa_name (lhs_var, ifunc_call_stmt); > + gimple_call_set_lhs (ifunc_call_stmt, lhs_var_ssa_name); > + return_stmt = gimple_build_return (lhs_var_ssa_name); > + } > + } > + else > + { > + return_stmt = gimple_build_return (NULL_TREE); > + } > + > + mark_symbols_for_renaming (ifunc_call_stmt); > + mark_symbols_for_renaming (return_stmt); > + gimple_seq_add_stmt (&gseq, return_stmt); > + > + VEC_free (tree, heap, nargs); > + return gseq; > +} > + > +/* Empty the function body of function fndecl. Retain just one basic block > + along with the ENTRY and EXIT block. Return the retained basic block. */ > + > +static basic_block > +purge_function_body (tree fndecl) > +{ > + basic_block bb, new_bb; > + edge first_edge, last_edge; > + tree old_current_function_decl; > + > + old_current_function_decl = current_function_decl; > + push_cfun (DECL_STRUCT_FUNCTION (fndecl)); > + current_function_decl = fndecl; > + > + /* Set new_bb to be the first block after ENTRY_BLOCK_PTR. */ > + > + first_edge = VEC_index (edge, ENTRY_BLOCK_PTR->succs, 0); > + new_bb = first_edge->dest; > + gcc_assert (new_bb != NULL); > + > + for (bb = ENTRY_BLOCK_PTR; bb != NULL;) > + { > + edge_iterator ei; > + edge e; > + basic_block bb_next; > + bb_next = bb->next_bb; > + if (bb == EXIT_BLOCK_PTR) > + { > + VEC_truncate (edge, EXIT_BLOCK_PTR->preds, 0); > + } > + else if (bb == ENTRY_BLOCK_PTR) > + { > + VEC_truncate (edge, ENTRY_BLOCK_PTR->succs, 0); > + } > + else > + { > + remove_phi_nodes (bb); > + if (bb_seq (bb) != NULL) > + { > + gimple_stmt_iterator i; > + for (i = gsi_start_bb (bb); !gsi_end_p (i);) > + { > + gimple stmt = gsi_stmt (i); > + unlink_stmt_vdef (stmt); > + reset_debug_uses (stmt); > + gsi_remove (&i, true); > + release_defs (stmt); > + } > + } > + FOR_EACH_EDGE (e, ei, bb->succs) > + { > + n_edges--; > + ggc_free (e); > + } > + VEC_truncate (edge, bb->succs, 0); > + VEC_truncate (edge, bb->preds, 0); > + bb->prev_bb = NULL; > + bb->next_bb = NULL; > + if (bb == new_bb) > + { > + bb = bb_next; > + continue; > + } > + bb->il.gimple = NULL; > + SET_BASIC_BLOCK (bb->index, NULL); > + n_basic_blocks--; > + } > + bb = bb_next; > + } > + > + > + /* This is to allow iterating over the basic blocks. */ > + new_bb->next_bb = EXIT_BLOCK_PTR; > + EXIT_BLOCK_PTR->prev_bb = new_bb; > + > + new_bb->prev_bb = ENTRY_BLOCK_PTR; > + ENTRY_BLOCK_PTR->next_bb = new_bb; > + > + gcc_assert (find_edge (new_bb, EXIT_BLOCK_PTR) == NULL); > + last_edge = make_edge (new_bb, EXIT_BLOCK_PTR, 0); > + gcc_assert (last_edge); > + > + gcc_assert (find_edge (ENTRY_BLOCK_PTR, new_bb) == NULL); > + last_edge = make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); > + gcc_assert (last_edge); > + > + free_dominance_info (CDI_DOMINATORS); > + free_dominance_info (CDI_POST_DOMINATORS); > + calculate_dominance_info (CDI_DOMINATORS); > + calculate_dominance_info (CDI_POST_DOMINATORS); > + > + current_function_decl = old_current_function_decl; > + pop_cfun (); > + > + return new_bb; > +} > + > +/* Returns true if function FUNC_DECL contains abnormal goto statements. */ > + > +static bool > +function_can_make_abnormal_goto (tree func_decl) > +{ > + basic_block bb; > + FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (func_decl)) > + { > + gimple_stmt_iterator gsi; > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + gimple stmt = gsi_stmt (gsi); > + if (stmt_can_make_abnormal_goto (stmt)) > + return true; > + } > + } > + return false; > +} > + > +/* Has an entry for every cloned function and auxiliaries that have been > + generated by auto cloning. These cannot be further cloned. */ > + > +htab_t cloned_function_decls_htab = NULL; > + > +/* Adds function FUNC_DECL to the cloned_function_decls_htab. */ > + > +static void > +mark_function_not_cloneable (tree func_decl) > +{ > + void **slot; > + > + slot = htab_find_slot_with_hash (cloned_function_decls_htab, func_decl, > + htab_hash_pointer (func_decl), INSERT); > + gcc_assert (*slot == NULL); > + *slot = func_decl; > +} > + > +/* Entry point for the auto clone pass. Calls the target hook to determine > if > + this function must be cloned. */ > + > +static unsigned int > +do_auto_clone (void) > +{ > + tree opt_node = NULL_TREE; > + int num_versions = 0; > + int i = 0; > + tree fn_ver_addr_chain = NULL_TREE; > + tree default_ver = NULL_TREE; > + tree cond_func_decl = NULL_TREE; > + tree cond_func_addr; > + tree default_decl; > + basic_block empty_bb; > + gimple_seq gseq = NULL; > + gimple_stmt_iterator gsi; > + tree selector_decl; > + tree ifunc_decl; > + void **slot; > + struct cgraph_node *node; > + > + node = cgraph_node (current_function_decl); > + > + if (lookup_attribute ("noclone", DECL_ATTRIBUTES (current_function_decl)) > + != NULL) > + { > + if (dump_file) > + fprintf (dump_file, "Not cloning, noclone attribute set\n"); > + return 0; > + } > + > + /* Check if function size is within permissible limits for cloning. */ > + if (node->global.size > + > PARAM_VALUE (PARAM_MAX_FUNCTION_SIZE_FOR_AUTO_CLONING)) > + { > + if (dump_file) > + fprintf (dump_file, "Function size exceeds auto cloning > threshold.\n"); > + return 0; > + } > + > + if (cloned_function_decls_htab == NULL) > + cloned_function_decls_htab = htab_create (10, htab_hash_pointer, > + htab_eq_pointer, NULL); > + > + > + /* If this function is a clone or other, like the selector function, pass. > */ > + slot = htab_find_slot_with_hash (cloned_function_decls_htab, > + current_function_decl, > + htab_hash_pointer (current_function_decl), > + INSERT); > + > + if (*slot != NULL) > + return 0; > + > + if (profile_status == PROFILE_READ > + && !hot_function_p (cgraph_node (current_function_decl))) > + return 0; > + > + /* Ignore functions with abnormal gotos, not correct to clone them. */ > + if (function_can_make_abnormal_goto (current_function_decl)) > + return 0; > + > + if (!targetm.mversion_function) > + return 0; > + > + /* Call the target hook to see if this function needs to be versioned. */ > + num_versions = targetm.mversion_function (current_function_decl, &opt_node, > + &cond_func_decl); > + > + /* Nothing more to do if versions are not to be created. */ > + if (num_versions == 0) > + return 0; > + > + mark_function_not_cloneable (cond_func_decl); > + copy_decl_attributes (cond_func_decl, current_function_decl); > + > + /* Make as many clones as requested. */ > + for (i = 0; i < num_versions; ++i) > + { > + tree cloned_decl; > + char clone_name[100]; > + > + sprintf (clone_name, "autoclone.%d", i); > + cloned_decl = clone_function (current_function_decl, clone_name); > + fn_ver_addr_chain = tree_cons (build_fold_addr_expr (cloned_decl), > + NULL, fn_ver_addr_chain); > + gcc_assert (cloned_decl != NULL); > + mark_function_not_cloneable (cloned_decl); > + DECL_FUNCTION_SPECIFIC_TARGET (cloned_decl) > + = TREE_PURPOSE (opt_node); > + opt_node = TREE_CHAIN (opt_node); > + } > + > + /* The current function is replaced by an ifunc call to the right version. > + Make another clone for the default. */ > + default_decl = clone_function (current_function_decl, > "autoclone.original"); > + mark_function_not_cloneable (default_decl); > + /* Empty the body of the current function. */ > + empty_bb = purge_function_body (current_function_decl); > + default_ver = build_fold_addr_expr (default_decl); > + cond_func_addr = build_fold_addr_expr (cond_func_decl); > + > + /* Get the gimple sequence to replace the current function's body with a > + ifunc dispatch call to the right version. */ > + gseq = dispatch_using_ifunc (num_versions, current_function_decl, > + cond_func_addr, fn_ver_addr_chain, > + default_ver, &selector_decl, &ifunc_decl); > + > + mark_function_not_cloneable (selector_decl); > + mark_function_not_cloneable (ifunc_decl); > + > + for (gsi = gsi_start (gseq); !gsi_end_p (gsi); gsi_next (&gsi)) > + gimple_set_bb (gsi_stmt (gsi), empty_bb); > + > + set_bb_seq (empty_bb, gseq); > + > + if (dump_file) > + dump_function_to_file (current_function_decl, dump_file, TDF_BLOCKS); > + > + update_ssa (TODO_update_ssa_no_phi); > + > + return 0; > +} > + > +static bool > +gate_auto_clone (void) > +{ > + /* Turned on at -O2 and above. */ > + return optimize >= 2; > +} > + > +struct gimple_opt_pass pass_auto_clone = > +{ > + { > + GIMPLE_PASS, > + "auto_clone", /* name */ > + gate_auto_clone, /* gate */ > + do_auto_clone, /* execute */ > + NULL, /* sub */ > + NULL, /* next */ > + 0, /* static_pass_number */ > + TV_MVERSN_DISPATCH, /* tv_id */ > + PROP_cfg, /* properties_required */ > + PROP_cfg, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + TODO_dump_func | /* todo_flags_finish */ > + TODO_cleanup_cfg | TODO_dump_cgraph | > + TODO_update_ssa | TODO_verify_ssa > + } > +}; > Index: passes.c > =================================================================== > --- passes.c (revision 182355) > +++ passes.c (working copy) > @@ -1278,6 +1278,7 @@ init_optimization_passes (void) > /* These passes are run after IPA passes on every function that is being > output to the assembler file. */ > p = &all_passes; > + NEXT_PASS (pass_auto_clone); > NEXT_PASS (pass_direct_call_profile); > NEXT_PASS (pass_lower_eh_dispatch); > NEXT_PASS (pass_all_optimizations); > Index: config/i386/i386.opt > =================================================================== > --- config/i386/i386.opt (revision 182355) > +++ config/i386/i386.opt (working copy) > @@ -101,6 +101,10 @@ march= > Target RejectNegative Joined Var(ix86_arch_string) > Generate code for given CPU > > +mvarch= > +Target RejectNegative Joined Var(ix86_mv_arch_string) > +Multiversion for the given CPU(s) > + > masm= > Target RejectNegative Joined Var(ix86_asm_string) > Use given assembler dialect > Index: config/i386/i386.c > =================================================================== > --- config/i386/i386.c (revision 182355) > +++ config/i386/i386.c (working copy) > @@ -60,7 +60,11 @@ along with GCC; see the file COPYING3. If not see > #include "fibheap.h" > #include "tree-flow.h" > #include "tree-pass.h" > +#include "tree-dump.h" > +#include "gimple-pretty-print.h" > #include "cfgloop.h" > +#include "tree-scalar-evolution.h" > +#include "tree-vectorizer.h" > > enum upper_128bits_state > { > @@ -2353,6 +2357,8 @@ enum processor_type ix86_tune; > /* Which instruction set architecture to use. */ > enum processor_type ix86_arch; > > +char ix86_varch[PROCESSOR_max]; > + > /* true if sse prefetch instruction is not NOOP. */ > int x86_prefetch_sse; > > @@ -2492,6 +2498,7 @@ static enum calling_abi ix86_function_abi (const_t > /* Whether -mtune= or -march= were specified */ > static int ix86_tune_defaulted; > static int ix86_arch_specified; > +static int ix86_varch_specified; > > /* A mask of ix86_isa_flags that includes bit X if X > was set or cleared on the command line. */ > @@ -4316,6 +4323,36 @@ ix86_option_override_internal (bool main_args_p) > /* Disable vzeroupper pass if TARGET_AVX is disabled. */ > target_flags &= ~MASK_VZEROUPPER; > } > + > + /* Handle ix86_mv_arch_string. The values allowed are the same as > + -march=<>. More than one value is allowed and values must be > + comma separated. */ > + if (ix86_mv_arch_string) > + { > + char *token; > + char *varch; > + int i; > + > + ix86_varch_specified = 1; > + memset (ix86_varch, 0, sizeof (ix86_varch)); > + token = XNEWVEC (char, strlen (ix86_mv_arch_string) + 1); > + strcpy (token, ix86_mv_arch_string); > + varch = strtok ((char *)token, ","); > + while (varch != NULL) > + { > + for (i = 0; i < pta_size; i++) > + if (!strcmp (varch, processor_alias_table[i].name)) > + { > + ix86_varch[processor_alias_table[i].processor] = 1; > + break; > + } > + if (i == pta_size) > + error ("bad value (%s) for %sv-arch=%s %s", > + varch, prefix, suffix, sw); > + varch = strtok (NULL, ","); > + } > + free (token); > + } > } > > /* Return TRUE if VAL is passed in register with 256bit AVX modes. */ > @@ -26120,6 +26157,489 @@ ix86_fold_builtin (tree fndecl, int n_args ATTRIBU > return NULL_TREE; > } > > +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL > + to return integer VERSION_NUM if the outcome of the function > PREDICATE_DECL > + is true (or false if INVERT_CHECK is true). This function will be called > + during version dispatch to ecide which function version to execute. */ > + > +static basic_block > +add_condition_to_bb (tree function_decl, int version_num, > + basic_block new_bb, tree predicate_decl, > + bool invert_check) > +{ > + gimple return_stmt; > + gimple call_cond_stmt; > + gimple if_else_stmt; > + > + basic_block bb1, bb2, bb3; > + edge e12, e23; > + > + tree cond_var; > + gimple_seq gseq; > + > + tree old_current_function_decl; > + > + old_current_function_decl = current_function_decl; > + push_cfun (DECL_STRUCT_FUNCTION (function_decl)); > + current_function_decl = function_decl; > + > + gcc_assert (new_bb != NULL); > + gseq = bb_seq (new_bb); > + > + if (predicate_decl == NULL_TREE) > + { > + return_stmt = gimple_build_return (build_int_cst (NULL, version_num)); > + gimple_seq_add_stmt (&gseq, return_stmt); > + set_bb_seq (new_bb, gseq); > + gimple_set_bb (return_stmt, new_bb); > + pop_cfun (); > + current_function_decl = old_current_function_decl; > + return new_bb; > + } > + > + cond_var = create_tmp_var (integer_type_node, NULL); > + call_cond_stmt = gimple_build_call (predicate_decl, 0); > + gimple_call_set_lhs (call_cond_stmt, cond_var); > + add_referenced_var (cond_var); > + mark_symbols_for_renaming (call_cond_stmt); > + > + gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl)); > + gimple_set_bb (call_cond_stmt, new_bb); > + gimple_seq_add_stmt (&gseq, call_cond_stmt); > + > + if (!invert_check) > + if_else_stmt = gimple_build_cond (GT_EXPR, cond_var, > + integer_zero_node, > + NULL_TREE, NULL_TREE); > + else > + if_else_stmt = gimple_build_cond (LE_EXPR, cond_var, > + integer_zero_node, > + NULL_TREE, NULL_TREE); > + > + mark_symbols_for_renaming (if_else_stmt); > + gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl)); > + gimple_set_bb (if_else_stmt, new_bb); > + gimple_seq_add_stmt (&gseq, if_else_stmt); > + > + return_stmt = gimple_build_return (build_int_cst (NULL, version_num)); > + gimple_seq_add_stmt (&gseq, return_stmt); > + > + > + set_bb_seq (new_bb, gseq); > + > + bb1 = new_bb; > + e12 = split_block (bb1, if_else_stmt); > + bb2 = e12->dest; > + e12->flags &= ~EDGE_FALLTHRU; > + e12->flags |= EDGE_TRUE_VALUE; > + > + e23 = split_block (bb2, return_stmt); > + gimple_set_bb (return_stmt, bb2); > + bb3 = e23->dest; > + make_edge (bb1, bb3, EDGE_FALSE_VALUE); > + > + remove_edge (e23); > + make_edge (bb2, EXIT_BLOCK_PTR, 0); > + > + free_dominance_info (CDI_DOMINATORS); > + free_dominance_info (CDI_POST_DOMINATORS); > + calculate_dominance_info (CDI_DOMINATORS); > + calculate_dominance_info (CDI_POST_DOMINATORS); > + rebuild_cgraph_edges (); > + update_ssa (TODO_update_ssa); > + if (dump_file) > + dump_function_to_file (current_function_decl, dump_file, TDF_BLOCKS); > + > + pop_cfun (); > + current_function_decl = old_current_function_decl; > + > + return bb3; > +} > + > +/* This makes an empty function with one empty basic block *CREATED_BB > + apart from the ENTRY and EXIT blocks. */ > + > +static tree > +make_empty_function (basic_block *created_bb) > +{ > + tree decl, type, t; > + basic_block new_bb; > + tree old_current_function_decl; > + tree decl_name; > + char name[1000]; > + static int num = 0; > + > + /* The condition function should return an integer. */ > + type = build_function_type_list (integer_type_node, NULL_TREE); > + > + sprintf (name, "cond_%d", num); > + num++; > + decl = build_fn_decl (name, type); > + > + decl_name = get_identifier (name); > + SET_DECL_ASSEMBLER_NAME (decl, decl_name); > + DECL_NAME (decl) = decl_name; > + gcc_assert (cgraph_node (decl) != NULL); > + > + TREE_USED (decl) = 1; > + DECL_ARTIFICIAL (decl) = 1; > + DECL_IGNORED_P (decl) = 0; > + TREE_PUBLIC (decl) = 0; > + DECL_UNINLINABLE (decl) = 1; > + DECL_EXTERNAL (decl) = 0; > + DECL_CONTEXT (decl) = NULL_TREE; > + DECL_INITIAL (decl) = make_node (BLOCK); > + DECL_STATIC_CONSTRUCTOR (decl) = 0; > + TREE_READONLY (decl) = 0; > + DECL_PURE_P (decl) = 0; > + > + /* Build result decl and add to function_decl. */ > + t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); > + DECL_ARTIFICIAL (t) = 1; > + DECL_IGNORED_P (t) = 1; > + DECL_RESULT (decl) = t; > + > + gimplify_function_tree (decl); > + > + old_current_function_decl = current_function_decl; > + push_cfun (DECL_STRUCT_FUNCTION (decl)); > + current_function_decl = decl; > + init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl)); > + > + cfun->curr_properties |= > + (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars | > + PROP_ssa); > + > + new_bb = create_empty_bb (ENTRY_BLOCK_PTR); > + make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); > + make_edge (new_bb, EXIT_BLOCK_PTR, 0); > + > + /* This call is very important if this pass runs when the IR is in > + SSA form. It breaks things in strange ways otherwise. */ > + init_tree_ssa (DECL_STRUCT_FUNCTION (decl)); > + init_ssa_operands (); > + > + cgraph_add_new_function (decl, true); > + cgraph_call_function_insertion_hooks (cgraph_node (decl)); > + cgraph_mark_needed_node (cgraph_node (decl)); > + > + if (dump_file) > + dump_function_to_file (decl, dump_file, TDF_BLOCKS); > + > + pop_cfun (); > + current_function_decl = old_current_function_decl; > + *created_bb = new_bb; > + return decl; > +} > + > +/* This function conservatively checks if loop LOOP is tree vectorizable. > + The code is adapted from tree-vectorize.cc and tree-vect-stmts.cc */ > + > +static bool > +is_loop_form_vectorizable (struct loop *loop) > +{ > + /* Inner most loops should have 2 basic blocks. */ > + if (!loop->inner) > + { > + /* This is inner most. */ > + if (loop->num_nodes != 2) > + return false; > + /* Empty loop. */ > + if (empty_block_p (loop->header)) > + return false; > + } > + else > + { > + /* Bail if there are multiple nested loops. */ > + if ((loop->inner)->inner || (loop->inner)->next) > + return false; > + /* Recursive call for the inner loop. */ > + if (!is_loop_form_vectorizable (loop->inner)) > + return false; > + if (loop->num_nodes != 5) > + return false; > + /* The tree has 0 iterations. */ > + if (TREE_INT_CST_LOW (number_of_latch_executions (loop)) == 0) > + return false; > + } > + > + return true; > +} > + > +/* This function checks if there is atleast one vectorizable > + load/store in loop LOOP. Code adapted from tree-vect-stmts.cc. */ > + > +static bool > +is_loop_stmts_vectorizable (struct loop *loop) > +{ > + basic_block *body; > + unsigned int i; > + bool vect_load_store = false; > + > + body = get_loop_body (loop); > + > + for (i = 0; i < loop->num_nodes; i++) > + { > + gimple_stmt_iterator gsi; > + for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + gimple stmt = gsi_stmt (gsi); > + enum gimple_code code = gimple_code (stmt); > + > + if (gimple_has_volatile_ops (stmt)) > + return false; > + > + /* Does it have a vectorizable store or load in a hot bb? */ > + if (code == GIMPLE_ASSIGN) > + { > + enum tree_code lhs_code = TREE_CODE (gimple_assign_lhs (stmt)); > + enum tree_code rhs_code = gimple_assign_rhs_code (stmt); > + > + /* Only look at hot vectorizable loads/stores. */ > + if (profile_status == PROFILE_READ > + && !maybe_hot_bb_p (body[i])) > + continue; > + > + if (lhs_code == ARRAY_REF > + || lhs_code == INDIRECT_REF > + || lhs_code == COMPONENT_REF > + || lhs_code == IMAGPART_EXPR > + || lhs_code == REALPART_EXPR > + || lhs_code == MEM_REF) > + vect_load_store = true; > + else if (rhs_code == ARRAY_REF > + || rhs_code == INDIRECT_REF > + || rhs_code == COMPONENT_REF > + || rhs_code == IMAGPART_EXPR > + || rhs_code == REALPART_EXPR > + || rhs_code == MEM_REF) > + vect_load_store = true; > + } > + } > + } > + > + return vect_load_store; > +} > + > +/* This function checks if there are any vectorizable loops present > + in CURRENT_FUNCTION_DECL. This function is called before the > + loop optimization passes and is therefore very conservative in > + checking for vectorizable loops. Also, all the checks used in the > + vectorizer pass cannot used here since many loop optimizations > + have not occurred which could change the loop structure and the > + stmts. > + > + The conditions for a loop being vectorizable are adapted from > + tree-vectorizer.c, tree-vect-stmts.c. */ > + > +static bool > +any_loops_vectorizable_with_load_store (void) > +{ > + unsigned int vect_loops_num; > + loop_iterator li; > + struct loop *loop; > + bool vectorizable_loop_found = false; > + > + loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS); > + > + vect_loops_num = number_of_loops (); > + > + /* Bail out if there are no loops. */ > + if (vect_loops_num <= 1) > + { > + loop_optimizer_finalize (); > + return false; > + } > + > + scev_initialize (); > + > + /* This is iterating over all loops. */ > + FOR_EACH_LOOP (li, loop, 0) > + if (optimize_loop_nest_for_speed_p (loop)) > + { > + if (!is_loop_form_vectorizable (loop)) > + continue; > + if (!is_loop_stmts_vectorizable (loop)) > + continue; > + vectorizable_loop_found = true; > + break; > + } > + > + > + loop_optimizer_finalize (); > + scev_finalize (); > + > + return vectorizable_loop_found; > +} > + > +/* This makes the function that chooses the version to execute based > + on the condition. This condition function will decide which version > + of the function to execute. It should look like this: > + > + int cond_i () > + { > + __builtin_cpu_init (); // Get the cpu type. > + a = __builtin_cpu_is_<type1> (); > + if (a) > + return 1; // first version created. > + a = __builtin_cpu_is_<type2> (); > + if (a) > + return 2; // second version created. > + ... > + return 0; // the default version. > + } > + > + NEW_BB is the new last basic block of this function and to which more > + conditions can be added. It is updated by this function. */ > + > +static tree > +make_condition_function (basic_block *new_bb) > +{ > + gimple ifunc_cpu_init_stmt; > + gimple_seq gseq; > + tree cond_func_decl; > + tree old_current_function_decl; > + > + > + cond_func_decl = make_empty_function (new_bb); > + > + old_current_function_decl = current_function_decl; > + push_cfun (DECL_STRUCT_FUNCTION (cond_func_decl)); > + current_function_decl = cond_func_decl; > + > + gseq = bb_seq (*new_bb); > + > + /* Since this is possibly dispatched with IFUNC, call builtin_cpu_init > + explicitly, as the constructor will only fire after IFUNC > + initializers. */ > + ifunc_cpu_init_stmt = gimple_build_call_vec ( > + ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL); > + gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt); > + gimple_set_bb (ifunc_cpu_init_stmt, *new_bb); > + set_bb_seq (*new_bb, gseq); > + > + pop_cfun (); > + current_function_decl = old_current_function_decl; > + return cond_func_decl; > +} > + > +/* Create a new target optimization node with tune set to ARCH_TUNE. */ > +static tree > +create_mtune_target_opt_node (const char *arch_tune) > +{ > + struct cl_target_option target_options; > + const char *old_tune_string; > + tree optimization_node; > + > + /* Build an optimization node that is the same as the current one except > with > + "tune=arch_tune". */ > + cl_target_option_save (&target_options, &global_options); > + old_tune_string = ix86_tune_string; > + > + ix86_tune_string = arch_tune; > + ix86_option_override_internal (false); > + > + optimization_node = build_target_option_node (); > + > + ix86_tune_string = old_tune_string; > + cl_target_option_restore (&global_options, &target_options); > + > + return optimization_node; > +} > + > +/* Should a version of this function be specially optimized for core2? > + > + This function should have checks to see if there are any opportunities for > + core2 specific optimizations, otherwise do not create a clone. The > + following opportunities are checked. > + > + * Check if this function has vectorizable loads/stores as it is known that > + unaligned 128-bit movs to/from memory (movdqu) are very expensive on > + core2 whereas the later generations like corei7 have no additional > + overhead. > + > + This versioning is triggered only when -ftree-vectorize is turned on > + and when multi-versioning for core2 is requested using -mvarch=core2. > + > + Return false if no versioning is required. Return true if a version must > + be created. Generate the *OPTIMIZATION_NODE that must be used to optimize > + the newly created version, that is tag "tune=core2" on the new version. > */ > + > +static bool > +mversion_for_core2 (tree *optimization_node, > + tree *cond_func_decl, basic_block *new_bb) > +{ > + tree predicate_decl; > + bool is_mversion_target_core2 = false; > + bool create_version = false; > + > + if (ix86_varch_specified > + && ix86_varch[PROCESSOR_CORE2_64]) > + is_mversion_target_core2 = true; > + > + /* Check for criteria to create a new version for core2. */ > + > + /* If -ftree-vectorize is not used of MV is not requested, bail. */ > + if (flag_tree_vectorize && is_mversion_target_core2) > + { > + /* Check if there is atleast one loop that has a vectorizable > load/store. > + These are the ones that can generate the unaligned mov which is > known > + to be very slow on core2. */ > + if (any_loops_vectorizable_with_load_store ()) > + create_version = true; > + } > + /* else if XXX: Add more criteria to version for core2. */ > + > + if (!create_version) > + return false; > + > + /* If the condition function's body has not been created, create it now. > */ > + if (*cond_func_decl == NULL) > + *cond_func_decl = make_condition_function (new_bb); > + > + *optimization_node = create_mtune_target_opt_node ("core2"); > + > + predicate_decl = ix86_builtins [(int) IX86_BUILTIN_CPU_IS_INTEL_CORE2]; > + *new_bb = add_condition_to_bb (*cond_func_decl, 0, *new_bb, > + predicate_decl, false); > + return true; > +} > + > +/* Should this function CURRENT_FUNCTION_DECL be multi-versioned, if so > + the number of versions to be created (other than the original) is > + returned. The outcome of COND_FUNC_DECL will decide the version to be > + executed. The OPTIMIZATION_NODE_CHAIN has a unique node for each > + version to be created. */ > + > +static int > +ix86_mversion_function (tree fndecl ATTRIBUTE_UNUSED, > + tree *optimization_node_chain, > + tree *cond_func_decl) > +{ > + basic_block new_bb; > + tree optimization_node; > + int num_versions_created = 0; > + > + if (ix86_mv_arch_string == NULL) > + return 0; > + > + if (mversion_for_core2 (&optimization_node, cond_func_decl, &new_bb)) > + num_versions_created++; > + > + if (!num_versions_created) > + return 0; > + > + *optimization_node_chain = tree_cons (optimization_node, > + NULL_TREE, *optimization_node_chain); > + > + /* Return the default version as the last stmt in cond_func_decl. */ > + if (*cond_func_decl != NULL) > + new_bb = add_condition_to_bb (*cond_func_decl, num_versions_created, > + new_bb, NULL_TREE, false); > + > + return num_versions_created; > +} > + > /* A builtin to init/return the cpu type or feature. Returns an > integer and the type is a const if IS_CONST is set. */ > > @@ -35608,6 +36128,9 @@ ix86_loop_unroll_adjust (unsigned nunroll, struct > #undef TARGET_FOLD_BUILTIN > #define TARGET_FOLD_BUILTIN ix86_fold_builtin > > +#undef TARGET_MVERSION_FUNCTION > +#define TARGET_MVERSION_FUNCTION ix86_mversion_function > + > #undef TARGET_SLOW_UNALIGNED_VECTOR_MEMOP > #define TARGET_SLOW_UNALIGNED_VECTOR_MEMOP ix86_slow_unaligned_vector_memop > > Index: params.def > =================================================================== > --- params.def (revision 182355) > +++ params.def (working copy) > @@ -1037,6 +1037,11 @@ DEFPARAM (PARAM_PMU_PROFILE_N_ADDRESS, > "While doing PMU profiling symbolize this many top addresses.", > 50, 1, 10000) > > +DEFPARAM (PARAM_MAX_FUNCTION_SIZE_FOR_AUTO_CLONING, > + "autoclone-function-size-limit", > + "Do not auto clone functions beyond this size.", > + 450, 0, 100000) > + > /* > Local variables: > mode:c > > -- > This patch is available for review at http://codereview.appspot.com/5490054