On Wed, Dec 11, 2013 at 10:15 AM, Bernhard Reutner-Fischer <rep.dot....@gmail.com> wrote:
>>> Patch updated with two more tests to check if the vfmadd insn is being >>> produced when possible. >>> >>> Thanks >>> Sri >>> >>> On Fri, Dec 6, 2013 at 11:12 AM, Sriraman Tallam <tmsri...@google.com> >>> wrote: >>>> Hi, >>>> >>>> I have attached a patch to fix >>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59390. Please review. >>>> >>>> Here is the problem. GCC adds target-specific builtins on demand. The >>>> FMA target-specific builtin __builtin_ia32_vfmaddpd gets added via >>>> this declaration: >>>> >>>> void fun() __attribute__((target("fma"))); >>>> >>>> Specifically, the builtin __builtin_ia32_vfmaddpd gets added when >>>> ix86_add_new_builtins is called from ix86_valid_target_attribute_tree >>>> when processing this target attribute. >>>> >>>> Now, when the vectorizer is processing the builtin "__builtin_fma" in >>>> function other_fn(), it checks to see if this function is vectorizable >>>> and calls ix86_builtin_vectorized_function in i386.c. That returns the >>>> builtin stored here: >>>> >>>> >>>> case BUILT_IN_FMA: >>>> if (out_mode == DFmode && in_mode == DFmode) >>>> { >>>> if (out_n == 2 && in_n == 2) >>>> return ix86_builtins[IX86_BUILTIN_VFMADDPD]; >>>> .... >>>> >>>> ix86_builtins[IX86_BUILTIN_VFMADDPD] would have contained NULL_TREE >>>> had the builtin not been added by the previous target attribute. That >>>> is why the code works if we remove the previous declaration. >>>> >>>> The fix is to not just return the builtin but to also check if the >>>> current function's isa allows the use of the builtin. For instance, >>>> this patch would solve the problem: >>>> >>>> @@ -33977,7 +33977,13 @@ ix86_builtin_vectorized_function (tree fndecl, tre >>>> if (out_mode == DFmode && in_mode == DFmode) >>>> { >>>> if (out_n == 2 && in_n == 2) >>>> - return ix86_builtins[IX86_BUILTIN_VFMADDPD]; >>>> + { >>>> + if (ix86_builtins_isa[IX86_BUILTIN_VFMADDPD].isa >>>> + & global_options.x_ix86_isa_flags) >>>> + return ix86_builtins[IX86_BUILTIN_VFMADDPD]; >>>> + else >>>> + return NULL_TREE; >>>> + } >>>> >>>> >>>> but there are many instances of this usage in >>>> ix86_builtin_vectorized_function. This patch covers all the cases. >> >>> PR target/59390 >>> * gcc.target/i386/pr59390.c: New test. >>> * gcc.target/i386/pr59390_1.c: New test. >>> * gcc.target/i386/pr59390_2.c: New test. >>> * config/i386/i386.c (get_builtin): New function. >>> (ix86_builtin_vectorized_function): Replace all instances of >>> ix86_builtins[...] with get_builtin(...). >>> (ix86_builtin_reciprocal): Ditto. >> >> OK, with a couple of nits: >> >> +static tree get_builtin (enum ix86_builtins code) >> >> Please name this function ix86_get_builtin. >> >> + if (current_function_decl) >> + target_tree = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl); >> + if (target_tree) >> + opts = TREE_TARGET_OPTION (target_tree); >> + else >> + opts = TREE_TARGET_OPTION (target_option_default_node); >> + >> >> opts = TREE_TARGET_OPTION (target_tree ? target_tree : >> target_option_default_node); > > Just curious: >> +static tree get_builtin (enum ix86_builtins code) >> +{ > [] >> +> > [] >> @@ -33677,9 +33701,9 @@ ix86_builtin_vectorized_function (tree fndecl, tre >> if (out_mode == DFmode && in_mode == DFmode) >> { >> if (out_n == 2 && in_n == 2) >> - return ix86_builtins[IX86_BUILTIN_SQRTPD]; >> + get_builtin (IX86_BUILTIN_SQRTPD); >> else if (out_n == 4 && in_n == 4) >> - return ix86_builtins[IX86_BUILTIN_SQRTPD256]; >> + get_builtin (IX86_BUILTIN_SQRTPD256); >> } >> break; > > I must be missing something? > Don't you have to return ix86_get_builtin(...) ? Huh, I didn't even notice this mistake. Uros.