[PATCH, rs6000] Cleanup some vstrir define_expand naming inconsistencies
[PATCH, rs6000] Cleanup some vstrir define_expand naming inconsistencies Hi, This cleans up some of the naming around the vstrir and vstril instruction definitions, with some cosmetic changes for consistency. No functional changes. Regtested just in case, no regressions. :-) OK for trunk? Thanks, gcc/ * config/rs6000/altivec.md (vstrir_code_): Rename to vstrir_internal_. (vstrir_p_code_): Rename to vstrir_p_internal_. (vstril_code_): Rename to vstril_internal_. (vstril_p_code_): Rename to vstril_p_internal_. diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index efc8ae35c2e7..5aea02e9ad6e 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -884,44 +884,44 @@ (define_expand "vstrir_" (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")] UNSPEC_VSTRIR))] "TARGET_POWER10" { if (BYTES_BIG_ENDIAN) -emit_insn (gen_vstrir_code_ (operands[0], operands[1])); +emit_insn (gen_vstrir_internal_ (operands[0], operands[1])); else -emit_insn (gen_vstril_code_ (operands[0], operands[1])); +emit_insn (gen_vstril_internal_ (operands[0], operands[1])); DONE; }) -(define_insn "vstrir_code_" +(define_insn "vstrir_internal_" [(set (match_operand:VIshort 0 "altivec_register_operand" "=v") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand" "v")] UNSPEC_VSTRIR))] "TARGET_POWER10" "vstrir %0,%1" [(set_attr "type" "vecsimple")]) -;; This expands into same code as vstrir_ followed by condition logic +;; This expands into same code as vstrir followed by condition logic ;; so that a single vstribr. or vstrihr. or vstribl. or vstrihl. instruction ;; can, for example, satisfy the needs of a vec_strir () function paired ;; with a vec_strir_p () function if both take the same incoming arguments. (define_expand "vstrir_p_" [(match_operand:SI 0 "gpc_reg_operand") (match_operand:VIshort 1 "altivec_register_operand")] "TARGET_POWER10" { rtx scratch = gen_reg_rtx (mode); if (BYTES_BIG_ENDIAN) -emit_insn (gen_vstrir_p_code_ (scratch, operands[1])); +emit_insn (gen_vstrir_p_internal_ (scratch, operands[1])); else -emit_insn (gen_vstril_p_code_ (scratch, operands[1])); +emit_insn (gen_vstril_p_internal_ (scratch, operands[1])); emit_insn (gen_cr6_test_for_zero (operands[0])); DONE; }) -(define_insn "vstrir_p_code_" +(define_insn "vstrir_p_internal_" [(set (match_operand:VIshort 0 "altivec_register_operand" "=v") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand" "v")] UNSPEC_VSTRIR)) (set (reg:CC CR6_REGNO) @@ -936,17 +936,17 @@ (define_expand "vstril_" (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")] UNSPEC_VSTRIR))] "TARGET_POWER10" { if (BYTES_BIG_ENDIAN) -emit_insn (gen_vstril_code_ (operands[0], operands[1])); +emit_insn (gen_vstril_internal_ (operands[0], operands[1])); else -emit_insn (gen_vstrir_code_ (operands[0], operands[1])); +emit_insn (gen_vstrir_internal_ (operands[0], operands[1])); DONE; }) -(define_insn "vstril_code_" +(define_insn "vstril_internal_" [(set (match_operand:VIshort 0 "altivec_register_operand" "=v") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand" "v")] UNSPEC_VSTRIL))] "TARGET_POWER10" @@ -962,18 +962,18 @@ (define_expand "vstril_p_" (match_operand:VIshort 1 "altivec_register_operand")] "TARGET_POWER10" { rtx scratch = gen_reg_rtx (mode); if (BYTES_BIG_ENDIAN) -emit_insn (gen_vstril_p_code_ (scratch, operands[1])); +emit_insn (gen_vstril_p_internal_ (scratch, operands[1])); else -emit_insn (gen_vstrir_p_code_ (scratch, operands[1])); +emit_insn (gen_vstrir_p_internal_ (scratch, operands[1])); emit_insn (gen_cr6_test_for_zero (operands[0])); DONE; }) -(define_insn "vstril_p_code_" +(define_insn "vstril_p_internal_" [(set (match_operand:VIshort 0 "altivec_register_operand" "=v") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand" "v")] UNSPEC_VSTRIL)) (set (reg:CC CR6_REGNO)
Re: [PATCH, rs6000] Cleanup some vstrir define_expand naming inconsistencies
On Wed, 2022-07-13 at 14:39 -0500, Segher Boessenkool wrote: > Hi! > > On Wed, Jul 13, 2022 at 01:18:29PM -0500, will schmidt wrote: > > This cleans up some of the naming around the vstrir and vstril > > instruction definitions, with some cosmetic changes for > > consistency. > > gcc/ > > * config/rs6000/altivec.md (vstrir_code_): Rename > > to vstrir_internal_. > > (vstrir_p_code_): Rename to vstrir_p_internal_. > > (vstril_code_): Rename to vstril_internal_. > > (vstril_p_code_): Rename to vstril_p_internal_. > > It doesn't show the new names on the lhs this way. One way to do > better > is to write e.g. > (vstril_code_): Rename to... > (vstril_internal_): ... this. Ok. > > It often is a good idea to say "... for VIshort" and similar > btw. Ok. > > I'm not a fan of "internal" either, it doesn't say anything. At > least > put it at the very end of the names please? I'm easily convinced. ;-) I wonder if I should just drop "_internal" entirely and go with "vstrir_". Otherwise I'll rework to be "vstrir__internal". At a glance I see we do have some other existing define_insn entries with _internal at the tail and a few others embedded in the middle. I'll leave a note and perhaps review those after. :-) Thanks, -Will > > Okay for trunk with that changed. Thanks! > > > Segher
[PATCH, rs6000] Additional cleanup of rs6000_builtin_mask
[PATCH, rs6000] Additional cleanup of rs6000_builtin_mask Hi, Post the rs6000 builtins rewrite, some of the leftover builtin code is redundant and can be removed. This replaces the remaining usage of bu_mask in rs6000_target_modify_macros() with checks against the rs6000_cpu directly. Thusly the bu_mask variable can be removed. After that variable is eliminated there are no other uses of rs6000_builtin_mask_calculate(), so that function can also be safely removed. I have tested this on current systems (P8,P9,P10) without regressions. OK for trunk? Thanks, -Will gcc/ * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Remove bu_mask references. (rs6000_define_or_undefine_macro): Replace bu_mask reference with a rs6000_cpu value check. (rs6000_cpu_cpp_builtins): Remove rs6000_builtin_mask_calculate() parameter from call to rs6000_target_modify_macros. * config/rs6000/rs6000-protos.h (rs6000_target_modify_macros, rs6000_target_modify_macros_ptr): Remove parameter from extern for the prototype. * config/rs6000/rs6000.cc (rs6000_target_modify_macros_ptr): Remove parameter from prototype, update calls to this function. (rs6000_print_builtin_options): Remove prototype, call and function. (rs6000_builtin_mask_calculate): Remove function. (rs6000_debug_reg_global): Remove call to rs6000_print_builtin_options. (rs6000_option_override_internal): Remove rs6000_builtin_mask var and builtin_mask debug output. (rs6000_pragma_target_parse): Update calls to rs6000_target_modify_ptr. diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 0d13645040ff..4d051b906582 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -333,24 +333,20 @@ rs6000_define_or_undefine_macro (bool define_p, const char *name) else cpp_undef (parse_in, name); } /* Define or undefine macros based on the current target. If the user does - #pragma GCC target, we need to adjust the macros dynamically. Note, some of - the options needed for builtins have been moved to separate variables, so - have both the target flags and the builtin flags as arguments. */ + #pragma GCC target, we need to adjust the macros dynamically. */ void -rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags, -HOST_WIDE_INT bu_mask) +rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags) { if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET) fprintf (stderr, -"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX -", " HOST_WIDE_INT_PRINT_HEX ")\n", +"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX ")\n", (define_p) ? "define" : "undef", -flags, bu_mask); +flags); /* Each of the flags mentioned below controls whether certain preprocessor macros will be automatically defined when preprocessing source files for compilation by this compiler. While most of these flags can be enabled or disabled @@ -593,14 +589,12 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags, /* OPTION_MASK_FLOAT128_HARDWARE can be turned on if -mcpu=power9 is used or via the target attribute/pragma. */ if ((flags & OPTION_MASK_FLOAT128_HW) != 0) rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__"); - /* options from the builtin masks. */ - /* Note that OPTION_MASK_FPRND is enabled only if - (rs6000_cpu == PROCESSOR_CELL) (e.g. -mcpu=cell). */ - if ((bu_mask & OPTION_MASK_FPRND) != 0) + /* Tell the user if we are targeting CELL. */ + if (rs6000_cpu == PROCESSOR_CELL) rs6000_define_or_undefine_macro (define_p, "__PPU__"); /* Tell the user if we support the MMA instructions. */ if ((flags & OPTION_MASK_MMA) != 0) rs6000_define_or_undefine_macro (define_p, "__MMA__"); @@ -614,12 +608,11 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags, void rs6000_cpu_cpp_builtins (cpp_reader *pfile) { /* Define all of the common macros. */ - rs6000_target_modify_macros (true, rs6000_isa_flags, - rs6000_builtin_mask_calculate ()); + rs6000_target_modify_macros (true, rs6000_isa_flags); if (TARGET_FRE) builtin_define ("__RECIP__"); if (TARGET_FRES) builtin_define ("__RECIPF__"); diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 3ea010236090..b3c16e7448d8 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -318,13 +318,12 @@ extern void rs6000_pragma_longcall (struct cpp_reader *); extern void rs6000_cpu_cpp_builtins (struct cpp_reader *); #ifdef TREE_CODE extern bool rs6000_pragma_target_parse (tree, tree); #endif extern void rs6000_activate_target_options (tree new_tree); -extern void rs6000
Re: [PATCH, rs6000] Additional cleanup of rs6000_builtin_mask
On Thu, 2022-07-14 at 11:28 +0800, Kewen.Lin wrote: > Hi Will, > > Thanks for the cleanup! Some comments are inlined. Hi, Thanks for the review. A few comments and responses below. TLDR I'll incorporate the suggestions in V2 that will show up ... after. :-) > > on 2022/7/14 05:39, will schmidt wrote: > > [PATCH, rs6000] Additional cleanup of rs6000_builtin_mask > > > > Hi, > > Post the rs6000 builtins rewrite, some of the leftover builtin > > code is redundant and can be removed. > > This replaces the remaining usage of bu_mask in > > rs6000_target_modify_macros() with checks against the rs6000_cpu > > directly. > > Thusly the bu_mask variable can be removed. After that variable > > is eliminated there are no other uses of > > rs6000_builtin_mask_calculate(), > > so that function can also be safely removed. > > > > The TargetVariable rs6000_builtin_mask in rs6000.opt is useless, it > seems > it can be removed together? Yes, if I also remove usage of x_rs6000_builtin_mask. There are a few remaining reference to x_r_b_m, but those appear safe to remove after this cleanup as well. I'll confirm and likely include the removal in V2. > > > I have tested this on current systems (P8,P9,P10) without > > regressions. > > > > OK for trunk? > > > > > > Thanks, > > -Will > > > > > > > > - /* Set the builtin mask of the various options used that could > > affect which > > - builtins were used. In the past we used target_flags, but > > we've run out > > - of bits, and some options are no longer in target_flags. */ > > - rs6000_builtin_mask = rs6000_builtin_mask_calculate (); > > - if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET) > > -rs6000_print_builtin_options (stderr, 0, "builtin mask", > > - rs6000_builtin_mask); > > - > > I wonder if it's a good idea to still dump some information for > built-in > functions debugging even with new bif framework, it can be handled in > a > separated patch if yes. The new bif framework adopts stanzas for bif > guarding, if we want to do similar things, we can refer to the code > like: > TARGET_POPCNTB means all bifs with ENB_P5 are available > TARGET_CMPB means all bifs with ENB_P6 are available > ... > > , dump information like "current enabled stanzas: ENB_xx, ENB_xxx, > ..." > (even without ENB_ prefix). Possibly. There does exist some debug already, and I still have some work in progress related to some of the OPTION and TARGET handling. I'll keep this in mind as I continue poking in this space. :-) > >/* Initialize all of the registers. */ > >rs6000_init_hard_regno_mode_ok (global_init_p); > > > >/* Save the initial options in case the user does function > > specific options */ > >if (global_init_p) > > @@ -24495,17 +24442,15 @@ rs6000_pragma_target_parse (tree args, > > tree pop_target) > > > >if ((diff_flags != 0) || (diff_bumask != 0)) > > { > > /* Delete old macros. */ > > rs6000_target_modify_macros_ptr (false, > > - prev_flags & diff_flags, > > - prev_bumask & diff_bumask); > > + prev_flags & diff_flags); > > > > /* Define new macros. */ > > rs6000_target_modify_macros_ptr (true, > > - cur_flags & diff_flags, > > - cur_bumask & diff_bumask); > > + cur_flags & diff_flags); > > } > > } > > > >return true; > > } > > @@ -24732,19 +24677,10 @@ rs6000_print_isa_options (FILE *file, int > > indent, const char *string, > >rs6000_print_options_internal (file, indent, string, flags, "- > > m", > > &rs6000_opt_masks[0], > > ARRAY_SIZE (rs6000_opt_masks)); > > } > > > > -static void > > -rs6000_print_builtin_options (FILE *file, int indent, const char > > *string, > > - HOST_WIDE_INT flags) > > -{ > > - rs6000_print_options_internal (file, indent, string, flags, "", > > -&rs6000_builtin_mask_names[0], > > -ARRAY_SIZE > > (rs6000_builtin_mask_names)); > > -} > > rs6000_builtin_mask_names becomes useless too, can be removed too? It can. I'll include removal in V2. Thanks -Will > > BR, > Kewen
[PATCH, rs6000, v2] Additional cleanup of rs6000_builtin_mask
[PATCH, rs6000, v2] Additional cleanup of rs6000_builtin_mask Hi, Post the rs6000 builtins rewrite, some of the leftover builtin code is redundant and can be removed. This replaces the usage of bu_mask in rs6000_target_modify_macros with checks against the rs6000_isa_flags equivalent directly. Thusly the bu_mask variable can be removed. After this update there are no other uses of rs6000_builtin_mask_calculate, so that function can also be safely removed. No functional change, though some output under debug has been removed. [V2] Per patch review and subsequent investigations, the rs6000_builtin_mask and x_rs6000_builtin_mask can also be removed, as well as the entirety of the rs6000_builtin_mask_names table. gcc/ * config/rs6000/rs6000-c.cc: Update comments. (rs6000_target_modify_macros): Remove bu_mask references. (rs6000_define_or_undefine_macro): Replace bu_mask reference with a rs6000_cpu value check. (rs6000_cpu_cpp_builtins): Remove rs6000_builtin_mask_calculate() parameter from call to rs6000_target_modify_macros. * config/rs6000/rs6000-protos.h (rs6000_target_modify_macros, rs6000_target_modify_macros_ptr): Remove parameter from extern for the prototype. * config/rs6000/rs6000.cc (rs6000_target_modify_macros_ptr): Remove parameter from prototype, update calls to this function. (rs6000_print_builtin_options): Remove prototype, call and function. (rs6000_builtin_mask_calculate): Remove function. (rs6000_debug_reg_global): Remove call to rs6000_print_builtin_options. (rs6000_option_override_internal): Remove rs6000_builtin_mask var and builtin_mask debug output. (rs6000_builtin_mask_names): Remove. (rs6000_pragma_target_parse): Remove prev_bumask, cur_bumask, diff_bumask references; Update calls to rs6000_target_modify_ptr. * config/rs6000/rs6000.opt (rs6000_builtin_mask): Remove. diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 0d13645040ff..4d051b906582 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -333,24 +333,20 @@ rs6000_define_or_undefine_macro (bool define_p, const char *name) else cpp_undef (parse_in, name); } /* Define or undefine macros based on the current target. If the user does - #pragma GCC target, we need to adjust the macros dynamically. Note, some of - the options needed for builtins have been moved to separate variables, so - have both the target flags and the builtin flags as arguments. */ + #pragma GCC target, we need to adjust the macros dynamically. */ void -rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags, -HOST_WIDE_INT bu_mask) +rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags) { if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET) fprintf (stderr, -"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX -", " HOST_WIDE_INT_PRINT_HEX ")\n", +"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX ")\n", (define_p) ? "define" : "undef", -flags, bu_mask); +flags); /* Each of the flags mentioned below controls whether certain preprocessor macros will be automatically defined when preprocessing source files for compilation by this compiler. While most of these flags can be enabled or disabled @@ -593,14 +589,12 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags, /* OPTION_MASK_FLOAT128_HARDWARE can be turned on if -mcpu=power9 is used or via the target attribute/pragma. */ if ((flags & OPTION_MASK_FLOAT128_HW) != 0) rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__"); - /* options from the builtin masks. */ - /* Note that OPTION_MASK_FPRND is enabled only if - (rs6000_cpu == PROCESSOR_CELL) (e.g. -mcpu=cell). */ - if ((bu_mask & OPTION_MASK_FPRND) != 0) + /* Tell the user if we are targeting CELL. */ + if (rs6000_cpu == PROCESSOR_CELL) rs6000_define_or_undefine_macro (define_p, "__PPU__"); /* Tell the user if we support the MMA instructions. */ if ((flags & OPTION_MASK_MMA) != 0) rs6000_define_or_undefine_macro (define_p, "__MMA__"); @@ -614,12 +608,11 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags, void rs6000_cpu_cpp_builtins (cpp_reader *pfile) { /* Define all of the common macros. */ - rs6000_target_modify_macros (true, rs6000_isa_flags, - rs6000_builtin_mask_calculate ()); + rs6000_target_modify_macros (true, rs6000_isa_flags); if (TARGET_FRE) builtin_define ("__RECIP__"); if (TARGET_FRES) builtin_define ("__RECIPF__"); diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 3ea010236090..b3c16e7448d8 100644 --- a/gcc/config/rs6000/rs6000-pr
[PATCH, rs6000, v2] Cleanup some vstrir define_expand naming inconsistencies
[PATCH, rs6000, v2] Cleanup some vstrir define_expand naming inconsistencies Hi, This cleans up some of the naming around the vstrir and vstril instruction definitions, with some cosmetic changes for consistency. No functional changes. Regtested just in case, no regressions. [V2] Used 'direct' instead of 'internal', and cosmetically reworked the changelog. OK for trunk? Thanks, gcc/ * config/rs6000/altivec.md: (vstrir_code_): Rename to... (vstrir_direct_): ... this. (vstrir_p_code_): Rename to... (vstrir_p_direct_): ... this. (vstril_code_): Rename to... (vstril_direct_): ... this. (vstril_p_code_): Rename to... (vstril_p_direct_): ... this. diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index efc8ae35c2e7..2c4940f2e21c 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -884,44 +884,44 @@ (define_expand "vstrir_" (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")] UNSPEC_VSTRIR))] "TARGET_POWER10" { if (BYTES_BIG_ENDIAN) -emit_insn (gen_vstrir_code_ (operands[0], operands[1])); +emit_insn (gen_vstrir_direct_ (operands[0], operands[1])); else -emit_insn (gen_vstril_code_ (operands[0], operands[1])); +emit_insn (gen_vstril_direct_ (operands[0], operands[1])); DONE; }) -(define_insn "vstrir_code_" +(define_insn "vstrir_direct_" [(set (match_operand:VIshort 0 "altivec_register_operand" "=v") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand" "v")] UNSPEC_VSTRIR))] "TARGET_POWER10" "vstrir %0,%1" [(set_attr "type" "vecsimple")]) -;; This expands into same code as vstrir_ followed by condition logic +;; This expands into same code as vstrir followed by condition logic ;; so that a single vstribr. or vstrihr. or vstribl. or vstrihl. instruction ;; can, for example, satisfy the needs of a vec_strir () function paired ;; with a vec_strir_p () function if both take the same incoming arguments. (define_expand "vstrir_p_" [(match_operand:SI 0 "gpc_reg_operand") (match_operand:VIshort 1 "altivec_register_operand")] "TARGET_POWER10" { rtx scratch = gen_reg_rtx (mode); if (BYTES_BIG_ENDIAN) -emit_insn (gen_vstrir_p_code_ (scratch, operands[1])); +emit_insn (gen_vstrir_p_direct_ (scratch, operands[1])); else -emit_insn (gen_vstril_p_code_ (scratch, operands[1])); +emit_insn (gen_vstril_p_direct_ (scratch, operands[1])); emit_insn (gen_cr6_test_for_zero (operands[0])); DONE; }) -(define_insn "vstrir_p_code_" +(define_insn "vstrir_p_direct_" [(set (match_operand:VIshort 0 "altivec_register_operand" "=v") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand" "v")] UNSPEC_VSTRIR)) (set (reg:CC CR6_REGNO) @@ -936,17 +936,17 @@ (define_expand "vstril_" (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")] UNSPEC_VSTRIR))] "TARGET_POWER10" { if (BYTES_BIG_ENDIAN) -emit_insn (gen_vstril_code_ (operands[0], operands[1])); +emit_insn (gen_vstril_direct_ (operands[0], operands[1])); else -emit_insn (gen_vstrir_code_ (operands[0], operands[1])); +emit_insn (gen_vstrir_direct_ (operands[0], operands[1])); DONE; }) -(define_insn "vstril_code_" +(define_insn "vstril_direct_" [(set (match_operand:VIshort 0 "altivec_register_operand" "=v") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand" "v")] UNSPEC_VSTRIL))] "TARGET_POWER10" @@ -962,18 +962,18 @@ (define_expand "vstril_p_" (match_operand:VIshort 1 "altivec_register_operand")] "TARGET_POWER10" { rtx scratch = gen_reg_rtx (mode); if (BYTES_BIG_ENDIAN) -emit_insn (gen_vstril_p_code_ (scratch, operands[1])); +emit_insn (gen_vstril_p_direct_ (scratch, operands[1])); else -emit_insn (gen_vstrir_p_code_ (scratch, operands[1])); +emit_insn (gen_vstrir_p_direct_ (scratch, operands[1])); emit_insn (gen_cr6_test_for_zero (operands[0])); DONE; }) -(define_insn "vstril_p_code_" +(define_insn "vstril_p_direct_" [(set (match_operand:VIshort 0 "altivec_register_operand" "=v") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand" "v")] UNSPEC_VSTRIL)) (set (reg:CC CR6_REGNO)
[PATCH, rs6000] Deprecate unnecessary __builtin_dfp_dtstsfi_*_dd and td overloads
[PATCH, rs6000] Deprecate unnecessary __builtin_dfp_dtstsfi_*_dd and td overloads Hi, Noted as part of the work-in-progress builtins rewrite, the __builtin_dfp_dtstsfi_*_{dd,td} builtins are redundant, and are thusly being marked as deprecated. They will be removed as part of the builtins rewrite sometime in the future. This includes the builtins __builtin_dfp_dtstsfi_eq_dd, __builtin_dfp_dtstsfi_gt_dd, __builtin_dfp_dtstsfi_lt_dd, __builtin_dfp_dtstsfi_ov_dd, __builtin_dfp_dtstsfi_eq_td, __builtin_dfp_dtstsfi_gt_td, __builtin_dfp_dtstsfi_lt_td, and __builtin_dfp_dtstsfi_ov_td. Regtests underway. OK for trunk? Thanks -Will -- gcc/ChangeLog: * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Mark builtins P9_BUILTIN_DFP_TSTSFI_LT_DD, P9_BUILTIN_DFP_TSTSFI_EQ_DD P9_BUILTIN_DFP_TSTSFI_GT_DD, P9_BUILTIN_DFP_TSTSFI_OV_DD, P9_BUILTIN_DFP_TSTSFI_LT_TD, P9_BUILTIN_DFP_TSTSFI_EQ_TD, P9_BUILTIN_DFP_TSTSFI_GT_TD, P9_BUILTIN_DFP_TSTSFI_OV_TD as deprecated. * doc/extend.texi: Update examples to indicate deprecated functions. testsuite/ChangeLog: * gcc.target/powerpc/dfp/dtstsfi-10.c: Mark __builtin_dfp_dtstsfi_*_{dd,td} calls as deprecated. * gcc.target/powerpc/dfp/dtstsfi-11.c: Same. * gcc.target/powerpc/dfp/dtstsfi-12.c: Same. * gcc.target/powerpc/dfp/dtstsfi-13.c: Same. * gcc.target/powerpc/dfp/dtstsfi-14.c: Same. * gcc.target/powerpc/dfp/dtstsfi-15.c: Same. * gcc.target/powerpc/dfp/dtstsfi-16.c: Same. * gcc.target/powerpc/dfp/dtstsfi-17.c: Same. * gcc.target/powerpc/dfp/dtstsfi-18.c: Same. * gcc.target/powerpc/dfp/dtstsfi-19.c: Same. * gcc.target/powerpc/dfp/dtstsfi-30.c: Same. * gcc.target/powerpc/dfp/dtstsfi-31.c: Same. * gcc.target/powerpc/dfp/dtstsfi-32.c: Same. * gcc.target/powerpc/dfp/dtstsfi-33.c: Same. * gcc.target/powerpc/dfp/dtstsfi-34.c: Same. * gcc.target/powerpc/dfp/dtstsfi-35.c: Same. * gcc.target/powerpc/dfp/dtstsfi-36.c: Same. * gcc.target/powerpc/dfp/dtstsfi-37.c: Same. * gcc.target/powerpc/dfp/dtstsfi-38.c: Same. * gcc.target/powerpc/dfp/dtstsfi-39.c: Same. * gcc.target/powerpc/dfp/dtstsfi-50.c: Same. * gcc.target/powerpc/dfp/dtstsfi-51.c: Same. * gcc.target/powerpc/dfp/dtstsfi-52.c: Same. * gcc.target/powerpc/dfp/dtstsfi-53.c: Same. * gcc.target/powerpc/dfp/dtstsfi-54.c: Same. * gcc.target/powerpc/dfp/dtstsfi-55.c: Same. * gcc.target/powerpc/dfp/dtstsfi-56.c: Same. * gcc.target/powerpc/dfp/dtstsfi-57.c: Same. * gcc.target/powerpc/dfp/dtstsfi-58.c: Same. * gcc.target/powerpc/dfp/dtstsfi-59.c: Same. * gcc.target/powerpc/dfp/dtstsfi-70.c: Same. * gcc.target/powerpc/dfp/dtstsfi-71.c: Same. * gcc.target/powerpc/dfp/dtstsfi-72.c: Same. * gcc.target/powerpc/dfp/dtstsfi-73.c: Same. * gcc.target/powerpc/dfp/dtstsfi-74.c: Same. * gcc.target/powerpc/dfp/dtstsfi-75.c: Same. * gcc.target/powerpc/dfp/dtstsfi-76.c: Same. * gcc.target/powerpc/dfp/dtstsfi-77.c: Same. * gcc.target/powerpc/dfp/dtstsfi-78.c: Same. * gcc.target/powerpc/dfp/dtstsfi-79.c: Same. * gcc.target/powerpc/pr92661.c: Same. diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c index cdc64bd63c66..9a79e5684f20 100644 --- a/gcc/config/rs6000/rs6000-c.c +++ b/gcc/config/rs6000/rs6000-c.c @@ -946,10 +946,21 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, else if (fcode == ALTIVEC_BUILTIN_VEC_LVSR && !BYTES_BIG_ENDIAN) warning (OPT_Wdeprecated, "% is deprecated for little endian; use " "assignment for unaligned loads and stores"); + if (fcode == P9_BUILTIN_DFP_TSTSFI_LT_DD + || fcode == P9_BUILTIN_DFP_TSTSFI_EQ_DD + || fcode == P9_BUILTIN_DFP_TSTSFI_GT_DD + || fcode == P9_BUILTIN_DFP_TSTSFI_OV_DD + || fcode == P9_BUILTIN_DFP_TSTSFI_LT_TD + || fcode == P9_BUILTIN_DFP_TSTSFI_EQ_TD + || fcode == P9_BUILTIN_DFP_TSTSFI_GT_TD + || fcode == P9_BUILTIN_DFP_TSTSFI_OV_TD) + warning (OPT_Wdeprecated, "builtin '%s' is deprecated", + IDENTIFIER_POINTER (DECL_NAME (fndecl))); + if (fcode == ALTIVEC_BUILTIN_VEC_MUL) { /* vec_mul needs to be special cased because there are no instructions for it for the {un}signed char, {un}signed short, and {un}signed int types. */ diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 8daa1c679748..90db01daeac6 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -17859,31 +17859,33 @@ int __builtin_byte_in_set (unsigned char u, unsigned long long set); int __builtin_byte_in_range (unsigned char u, unsigned int range); int __builtin_byte_in_either_range (unsigned char u, unsigned int ranges); int __builtin_dfp_dtstsfi_lt (unsigned int com
Re: [PATCH] rs6000: Fix vec insert ilp32 ICE and test failures [PR98799]
On Tue, 2021-01-26 at 01:46 -0600, Xionghu Luo via Gcc-patches wrote: > From: "luo...@cn.ibm.com" > > UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT > is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for > variable vector insert. Remove rs6000_expand_vector_set_var helper > function, adjust the p8 and p9 definitions position and make them > static. > > The previous commit r11-6858 missed check m32, This patch is tested pass > on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with > RUNTESTFLAGS="--target_board =unix'{-m32,-m64}" for BE targets. > > gcc/ChangeLog: > > 2021-01-26 Xionghu Luo > David Edelsohn > > PR target/98799 > * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): > Don't generate VIEW_CONVERT_EXPR for m32. This is hinted at in the description, but would be good to be clear in the changelog too. Consider something like "Don't generate VIEW_CONVERT_EXPR for fcode ALTIVEC_BUILTIN_VEC_INSERT when -m32." > * config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var): > Delete. > * config/rs6000/rs6000.c (rs6000_expand_vector_set): Remove the > wrapper call rs6000_expand_vector_set_var. Call > rs6000_expand_vector_set_var_p9 and rs6000_expand_vector_set_var_p8 > directly. > (rs6000_expand_vector_set_var): Delete. The diff conflates the deleted function with the changes to an existing function, making it harder to sort out... Was/is deleting the rs6000_expand_vector_set_var() helper necessary for this fix, or just cleanup? Add: (rs6000_expand_vector_set_var_p9): Make static. (rs6000_expand_ vector_set_var_p8): Make static. > > gcc/testsuite/ChangeLog: > > 2021-01-26 Xionghu Luo > > PR target/98827 > * gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust ilp32. > * gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise. > * gcc.target/powerpc/fold-vec-insert-double.c: Likewise. > * gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise. > * gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise. > * gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise. > * gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise. > * gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise. > * gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise. > * gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise. > * gcc.target/powerpc/pr79251.p8.c: Likewise. > * gcc.target/powerpc/pr79251.p9.c: Likewise. > * gcc.target/powerpc/vsx-builtin-7.c: Likewise. Just a glance, those changes look OK. > --- > gcc/config/rs6000/rs6000-c.c | 2 +- > gcc/config/rs6000/rs6000-protos.h | 1 - > gcc/config/rs6000/rs6000.c| 236 +- > .../powerpc/fold-vec-insert-char-p8.c | 14 +- > .../powerpc/fold-vec-insert-char-p9.c | 6 +- > .../powerpc/fold-vec-insert-double.c | 10 +- > .../powerpc/fold-vec-insert-float-p8.c| 12 +- > .../powerpc/fold-vec-insert-float-p9.c| 6 +- > .../powerpc/fold-vec-insert-int-p8.c | 13 +- > .../powerpc/fold-vec-insert-int-p9.c | 9 +- > .../powerpc/fold-vec-insert-longlong.c| 8 +- > .../powerpc/fold-vec-insert-short-p8.c| 10 +- > .../powerpc/fold-vec-insert-short-p9.c| 13 +- > gcc/testsuite/gcc.target/powerpc/pr79251.p8.c | 17 +- > gcc/testsuite/gcc.target/powerpc/pr79251.p9.c | 16 +- > .../gcc.target/powerpc/vsx-builtin-7.c| 2 +- > 16 files changed, 203 insertions(+), 172 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c > index f6ee1e61b56..656cdb39f3f 100644 > --- a/gcc/config/rs6000/rs6000-c.c > +++ b/gcc/config/rs6000/rs6000-c.c > @@ -1600,7 +1600,7 @@ altivec_resolve_overloaded_builtin (location_t loc, > tree fndecl, > stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt); > } > > - if (TARGET_P8_VECTOR) > + if (TARGET_P8_VECTOR && TARGET_DIRECT_MOVE_64BIT) > { > stmt = build_array_ref (loc, stmt, arg2); > stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt, > diff --git a/gcc/config/rs6000/rs6000-protos.h > b/gcc/config/rs6000/rs6000-protos.h > index 9a46a414743..9cca7325d0d 100644 > --- a/gcc/config/rs6000/rs6000-protos.h > +++ b/gcc/config/rs6000/rs6000-protos.h > @@ -58,7 +58,6 @@ extern bool rs6000_split_128bit_ok_p (rtx []); > extern void rs6000_expand_float128_convert (rtx, rtx, bool); > extern void rs6000_expand_vector_init (rtx, rtx); > extern void rs6000_expand_vector_set (rtx, rtx, rtx); > -extern void rs6000_expand_vector_set_var (rtx, rtx, rtx); > extern void rs6000_expand_vector_extract (rtx, rtx, rtx); > extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx); > extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode); > dif
Re: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.
On Thu, 2021-01-14 at 11:59 -0500, Michael Meissner via Gcc-patches wrote: > From 78435dee177447080434cdc08fc76b1029c7f576 Mon Sep 17 00:00:00 2001 > From: Michael Meissner > Date: Wed, 13 Jan 2021 21:47:03 -0500 > Subject: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins. > > This patch replaces patches previously submitted: > > September 24th, 2020: > Message-ID: <20200924203159.ga31...@ibm-toto.the-meissners.org> > > October 9th, 2020: > Message-ID: <20201009043543.ga11...@ibm-toto.the-meissners.org> > > October 24th, 2020: > Message-ID: <2020100346.ga8...@ibm-toto.the-meissners.org> > > November 19th, 2020: > Message-ID: <20201119235814.ga...@ibm-toto.the-meissners.org> Subject and date should be sufficient _if_ having the old versions of the patchs are necessary to review the latest version of the patch. Which ideally is not the case. > > This patch maps the built-in functions that take or return long double > arguments on systems where long double is IEEE 128-bit. > > If long double is IEEE 128-bit, this patch goes through the built-in functions > and changes the name of the math, scanf, and printf built-in functions to use > the functions that GLIBC provides when long double uses the IEEE 128-bit > representation. ok. > > In addition, changing the name in GCC allows the Fortran compiler to > automatically use the correct name. Does the fortran compiler currently use the wrong name? (pr?) > > To map the math functions, typically this patch changes l to > __ieee128. However there are some exceptions that are handled with this > patch. This appears to be the rs6000_mangle_decl_assembler_name() function, which also maps l_r to ieee128_r, and looks like some additional special handling for printf and scanf. > To map the printf functions, is mapped to __ieee128. > > To map the scanf functions, is mapped to __isoc99_ieee128. > > I have tested this patch by doing builds, bootstraps, and make check with 3 > builds on a power9 little endian server: > > * Build one used the default long double being IBM 128-bit; > * Build two set the long double default to IEEE 128-bit; (and) > * Build three set the long double default to 64-bit. > ok > The compilers built fine providing I recompiled gmp, mpc, and mpfr with the > appropriate long double options. Presumably the build is otherwise broken... Does that mean more than invoking download_preqrequisites as part of the build? If there are specific options required during configure/build of those packages, they should be called out. > There were a few differences in the test > suite runs that will be addressed in later patches, but over all it works > well. Presumably minimal. :-) > This patch is required to be able to build a toolchain where the default > long double is IEEE 128-bit. Ok. Could lead the patch description with this,. I imagine this is just one of several patches that are still required towrards that goal. > Can I check this patch into the master branch for > GCC 11? > > gcc/ > 2021-01-14 Michael Meissner > > * config/rs6000/rs6000.c (ieee128_builtin_name): New function. > (built_in_uses_long_double): New function. > (identifier_ends_in_suffix): New function. > (rs6000_mangle_decl_assembler_name): Update support for mapping built-in > function names for long double built-in functions if long double is > IEEE 128-bit to catch all of the built-in functions that take or > return long double arguments. > > gcc/testsuite/ > 2021-01-14 Michael Meissner > > * gcc.target/powerpc/float128-longdouble-math.c: New test. > * gcc.target/powerpc/float128-longdouble-stdio.c: New test. > * gcc.target/powerpc/float128-math.c: Adjust test for new name > being generated. Add support for running test on power10. Add > support for running if long double defaults to 64-bits. > --- > gcc/config/rs6000/rs6000.c| 239 -- > .../powerpc/float128-longdouble-math.c| 442 ++ > .../powerpc/float128-longdouble-stdio.c | 36 ++ > .../gcc.target/powerpc/float128-math.c| 16 +- > 4 files changed, 694 insertions(+), 39 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/powerpc/float128-longdouble-math.c > create mode 100644 > gcc/testsuite/gcc.target/powerpc/float128-longdouble-stdio.c > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 6f48dd6566d..282703b9715 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -27100,6 +27100,172 @@ rs6000_globalize_decl_name (FILE * stream, tree > decl) > #endif > > > +/* If long double uses the IEEE 128-bit representation, return the name used > + within GLIBC for the IEEE 128-bit long double built-in, instead of the > + default IBM 128-bit long double built-in. Or return NULL if the built-in > + function does not use long double. */
Re: [PATCH, rs6000] improve vec_ctf invalid parameter handling. (pr91903)
Ping! Thanks -Will On Mon, 2021-01-04 at 18:03 -0600, will schmidt via Gcc-patches wrote: > On Mon, 2020-10-26 at 16:22 -0500, will schmidt wrote: > > [PATCH, rs6000] improve vec_ctf invalid parameter handling. > > > > Hi, > > Per PR91903, GCC ICEs when we attempt to pass a variable > > (or out of range value) into the vec_ctf() builtin. Per > > investigation, the parameter checking exists for this > > builtin with the int types, but was missing for > > the long long types. > > > > This patch adds the missing CODE_FOR_* entries to the > > rs6000_expand_binup_builtin to cover that scenario. > > This patch also updates some existing tests to remove > > calls to vec_ctf() and vec_cts() that contain negative > > values. > > > > Regtested clean on power7, power8, power9 Linux targets. > > > > OK for trunk? > > > I've reviewed the list archives in case my local inbox lost a > response.. I don't think this one was reviewed. > so.. > > ping! > > :-) > > thanks > -Will > > > > > > THanks, > > -Will > > > > PR target/91903 > > > > 2020-10-26 Will Schmidt > > > > gcc/ChangeLog: > > * config/rs6000/rs6000-call.c (rs6000_expand_binup_builtin): > > Add > > clauses for CODE_FOR_vsx_xvcvuxddp_scale and > > CODE_FOR_vsx_xvcvsxddp_scale to the parameter checking code. > > > > gcc/testsuite/ChangeLog: > > * testsuite/gcc.target/powerpc/pr91903.c: New test. > > * testsuite/gcc.target/powerpc/builtins-1.fold.h: Update. > > * testsuite/gcc.target/powerpc/builtins-2.c: Update. > > > > diff --git a/gcc/config/rs6000/rs6000-call.c > > b/gcc/config/rs6000/rs6000-call.c > > index b044778a7ae4..eb7e007e68d3 100644 > > --- a/gcc/config/rs6000/rs6000-call.c > > +++ b/gcc/config/rs6000/rs6000-call.c > > @@ -9447,11 +9447,13 @@ rs6000_expand_binop_builtin (enum insn_code > > icode, tree exp, rtx target) > > } > > } > >else if (icode == CODE_FOR_altivec_vcfux > >|| icode == CODE_FOR_altivec_vcfsx > >|| icode == CODE_FOR_altivec_vctsxs > > - || icode == CODE_FOR_altivec_vctuxs) > > + || icode == CODE_FOR_altivec_vctuxs > > + || icode == CODE_FOR_vsx_xvcvuxddp_scale > > + || icode == CODE_FOR_vsx_xvcvsxddp_scale) > > { > >/* Only allow 5-bit unsigned literals. */ > >STRIP_NOPS (arg1); > >if (TREE_CODE (arg1) != INTEGER_CST > > || TREE_INT_CST_LOW (arg1) & ~0x1f) > > diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h > > b/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h > > index 8bc5f5e43366..42d552295e3e 100644 > > --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h > > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h > > @@ -212,14 +212,14 @@ int main () > >extern vector unsigned long long u9; u9 = vec_mergeo (u3, u4); > > > >extern vector long long l8; l8 = vec_mul (l3, l4); > >extern vector unsigned long long u6; u6 = vec_mul (u3, u4); > > > > - extern vector double dh; dh = vec_ctf (la, -2); > > + extern vector double dh; dh = vec_ctf (la, 2); > >extern vector double di; di = vec_ctf (ua, 2); > >extern vector int sz; sz = vec_cts (fa, 0x1F); > > - extern vector long long l9; l9 = vec_cts (dh, -2); > > + extern vector long long l9; l9 = vec_cts (dh, 2); > >extern vector unsigned long long u7; u7 = vec_ctu (di, 2); > >extern vector unsigned int usz; usz = vec_ctu (fa, 0x1F); > > > >extern vector float f1; f1 = vec_mergee (fa, fb); > >extern vector float f2; f2 = vec_mergeo (fa, fb); > > diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-2.c > > b/gcc/testsuite/gcc.target/powerpc/builtins-2.c > > index 2aa23a377992..30acae47faff 100644 > > --- a/gcc/testsuite/gcc.target/powerpc/builtins-2.c > > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-2.c > > @@ -40,16 +40,16 @@ int main () > > > >if (se[0] != 27L || se[1] != 27L || sf[0] != -14L || sf[1] != > > -14L > >|| ue[0] != 27L || ue[1] != 27L || uf[0] != 14L || uf[1] != > > 14L) > > abort (); > > > > - vector double da = vec_ctf (sa, -2); > > + vector double da = vec_ctf (sa, 2); > >vector double db = vec_ctf (ua, 2); > > - vector long long sg = vec_cts (da, -2); > > + vector long long sg = vec_cts (da, 2); > >vector unsigned long long ug = vec_ctu (db, 2); > > > > - if (da[0] != 1
Re: [PATCH, rs6000] improve vec_ctf invalid parameter handling. (pr91903)
On Wed, 2021-01-27 at 18:24 -0600, Segher Boessenkool wrote: > Hi! > > On Mon, Oct 26, 2020 at 04:22:32PM -0500, will schmidt wrote: > > Per PR91903, GCC ICEs when we attempt to pass a variable > > (or out of range value) into the vec_ctf() builtin. Per > > investigation, the parameter checking exists for this > > builtin with the int types, but was missing for > > the long long types. > > > > This patch adds the missing CODE_FOR_* entries to the > > rs6000_expand_binup_builtin to cover that scenario. > > This patch also updates some existing tests to remove > > calls to vec_ctf() and vec_cts() that contain negative > > values. > > --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h > > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h > > @@ -212,14 +212,14 @@ int main () > >extern vector unsigned long long u9; u9 = vec_mergeo (u3, u4); > > > >extern vector long long l8; l8 = vec_mul (l3, l4); > >extern vector unsigned long long u6; u6 = vec_mul (u3, u4); > > > > - extern vector double dh; dh = vec_ctf (la, -2); > > + extern vector double dh; dh = vec_ctf (la, 2); > >extern vector double di; di = vec_ctf (ua, 2); > >extern vector int sz; sz = vec_cts (fa, 0x1F); > > - extern vector long long l9; l9 = vec_cts (dh, -2); > > + extern vector long long l9; l9 = vec_cts (dh, 2); > > I think removing the negative inputs here reduces test coverage? Why > did you change them, it isn't immediately clear to me? The vec_ctf() and vec_cts() builtins accept a const int parameter which should be in the range of 0..31. The PR was initially written/described as an ICE when a variable was passed into the builtin, and part of debug/fixups revealed that the testcase negative values were also invalid. I'll clarify that in the commit message. > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/pr91903.c > > @@ -0,0 +1,74 @@ > > +/* { dg-do compile */ > > +/* { dg-require-effective-target p8vector_hw } */ > > Compile tests should use p8vector_ok, instead. (We do not care what > kind of hardware the system under test is: we can run this on a > cross- > compiler just fine, after all!) ok > > > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > > Please skip this line. If the test does not work for Darwin Iain can > easily disable it, but if you do, no one will find out if it does > work. ok, sounds good. > > Okay for trunk with those things fixed, and the -2 thing looked at. > Thanks! > Thanks for the review. :-) > > Segher
Re: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.
On Wed, 2021-01-27 at 19:43 -0600, Segher Boessenkool wrote: > On Wed, Jan 27, 2021 at 01:06:46PM -0600, will schmidt wrote: > > On Thu, 2021-01-14 at 11:59 -0500, Michael Meissner via Gcc-patches > > wrote: > > > November 19th, 2020: > > > Message-ID: <20201119235814.ga...@ibm-toto.the-meissners.org> > > > > Subject and date should be sufficient > > Only if people pick good subjects, and do not send ten patches with a > similar subject line on the same day. I asked for the message id, > that works pretty much everywhere. Good points.. I wasn't aware you had specifically asked for the message ids. Thanks for clarifying the situation. :-) > > > _if_ having the old versions > > of the patchs are necessary to review the latest version of the > > patch. Which ideally is not the case. > > Stronger that that: I need to know what changed! So please just > explain > what changed, in just a short sentence or two, or more if that is > needed > (but not if it is not needed). > > > Segher
Re: [PATCH] Add conversions between _Float128 and Decimal.
On Thu, 2021-01-28 at 21:42 -0500, Michael Meissner via Gcc-patches wrote: > [PATCH] Add conversions between _Float128 and Decimal. > Hi, Just a couple cosmetic nits in the description. The changelog seems to match that patch contents OK. > This patch implements conversions between _Float128 and the 3 Decimal > floating types. It does by extendending the dfp-bit conversions to add a nit: 'does so' or 'does this' > new binary floating point type (KF), and doing the conversions in the same > mannor as the other binary/decimal conversions. manner. > > In particular for conversions from _Float128 to Decimal, it uses a sprintf > variant to convert _Float128 to strings, and a type specific function that > converts the string output to the appropriate Decimal type > > For conversions from one of the Decimal types to _Float128, it uses a decimal > function to convert to string (i.e. __decimalToString), and then uses a > variant of strtold to convert to _Float128. Are the sprintf and strtold functions called actually variants? > > If the user is linked against GLIBC 2.32 or newer, then the sprintf and > strtold > variant functions can use the features directly in GLIBC 2.32 to do this > conversion. > > If you have an older GLIBC and want to convert _Float128 to one of the Decimal > types, it will convert the _Float128 to __ibm128 and then convert that to > Decimal. > > Similarly if you have one of the Decimal types, and want to convert to > _Float128, it will first convert the Decimal type to __ibm128, and then > convert > __ibm128 to _Float128. > > These functions will primarily be used if/when the default PowerPC long double > type is changed to IEEE 128-bit, but they could also be used if the user > explicitly converts _Float128 to/from a Decimal type. > > One test case relating to Decimal fails if I build a compiler where the > default > is IEEE 128-bit: > > * c-c++-common/dfp/convert-bfp-11.c > > I have patches for this test, and they have been submitted separately. ok > > I have tested this patch by doing builds, bootstraps, and make check with 3 > builds on a power9 little endian server: > > * Build one used the default long double being IBM 128-bit; > * Build two set the long double default to IEEE 128-bit; (and) > * Build three set the long double default to 64-bit. > > The compilers built fine providing I recompiled gmp, mpc, and mpfr with the > appropriate long double options. There were a few differences in the test > suite runs that will be addressed in later patches, but over all it works > well. This patch is required to be able to build a toolchain where the > default > long double is IEEE 128-bit. > Can I check this patch into the master branch > for > GCC 11? Separate the check-in question from the description paragraph. > > I have also built compilers with this patch on a big endian power8 system that > has both 32-bit and 64-bit support. There were no regressions in running > these > tests on the system. > > Can I check this patch into the master branch? > > libgcc/ > 2021-01-28 Michael Meissner > > * config/rs6000/_dd_to_kf.c: New file. > * config/rs6000/_kf_to_dd.c: New file. > * config/rs6000/_kf_to_sd.c: New file. > * config/rs6000/_kf_to_td.c: New file. > * config/rs6000/_sd_to_kf.c: New file. > * config/rs6000/_sprintfkf.c: New file. > * config/rs6000/_sprintfkf.h: New file. > * config/rs6000/_strtokf.h: New file. > * config/rs6000/_strtokf.c: New file. > * config/rs6000/_td_to_kf.c: New file. > * config/rs6000/quad-float128.h: Add new declarations. > * config/rs6000/t-float128 (fp128_dec_funcs): New macro. > (fp128_decstr_funcs): New macro. > (ibm128_dec_funcs): New macro. > (fp128_ppc_funcs): Add the new conversions. > (fp128_dec_objs): Force Decimal <-> __float128 conversions to be > compiled with -mabi=ieeelongdouble. > (fp128_decstr_objs): Force __float128 <-> string conversions to be > compiled with -mabi=ibmlongdouble. > (ibm128_dec_objs): Force Decimal <-> __float128 conversions to be > compiled with -mabi=ieeelongdouble. > (FP128_CFLAGS_DECIMAL): New macro. > (IBM128_CFLAGS_DECIMAL): New macro. > * dfp-bit.c (DFP_TO_BFP): Add PowerPC _Float128 support. > (BFP_TO_DFP): Add PowerPC _Float128 support. > * dfp-bit.h (BFP_KIND): Add new binary floating point kind for > IEEE 128-bit floating point. > (DFP_TO_BFP): Add PowerPC _Float128 support. > (BFP_TO_DFP): Add PowerPC _Float128 support. > (BFP_SPRINTF): New macro. > --- > libgcc/config/rs6000/_dd_to_kf.c | 37 ++ > libgcc/config/rs6000/_kf_to_dd.c | 37 ++ > libgcc/config/rs6000/_kf_to_sd.c | 37 ++ > libgcc/config/rs6000/_kf_to_td.c | 37 ++ > libgcc/config/rs6000/_sd_to_
[committed] [rs6000] Fix typo in gcc.target/pr91903.c dg-require stanza
[PATCH, rs6000] Fix typo in gcc.target/pr91903.c dg-require stanza Hi, I somehow messed up when I tested this change.. Committed as obvious, also had pre-approval blessing per offline discussion. Fix obvious typo in testcases dg-require stanza. 2021-01-29 Will Schmidt testsuite/ChangeLog: * gcc.target/powerpc/pr91903.c: Fix dg-require stanza. diff --git a/gcc/testsuite/gcc.target/powerpc/pr91903.c b/gcc/testsuite/gcc.target/powerpc/pr91903.c index efd217e..3045d07 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr91903.c +++ b/gcc/testsuite/gcc.target/powerpc/pr91903.c @@ -1,5 +1,5 @@ /* { dg-do compile */ -/* { dg-require-effective-target p8vector_ok } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-options "-mdejagnu-cpu=power8" } */ #include
Re: [PATCH, rs6000] Optimization for PowerPC 64bit constant generation [PR94395]
On Fri, 2021-01-29 at 11:11 +0800, HAO CHEN GUI via Gcc-patches wrote: > Hi, > Hi, just a couple cosmetic nits below. Thanks, > This patch tries to optimize PowerPC 64 bit constant generation > when > the constant can be transformed from a 32 bit or 16 bit constant by > rotating, shifting and mask AND. Presumably this *does* optimize the constant generation. :-) s/tries to optimize/optimizes/ > > The attachments are the patch diff file and change log file. > > Bootstrapped and tested on powerpc64le with no regressions. Is > this > okay for trunk? Any recommendations? Thanks a lot. > >PR target/94395 >* config/rs6000/rs6000.c (rs6000_emit_set_32bit_const, >rs6000_rotate_long_const, rs6000_peel_long_const): New functions. >(rs6000_emit_set_long_const, num_insns_constant_gpr): Call new >functions. >* testsuite/gcc.target/powerpc/pr94395.c: New test. ok On Fri, 2021-01-29 at 11:11 +0800, HAO CHEN GUI via Gcc-patches wrote: > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index f26fc13484b..bcb867ffe94 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -1109,6 +1109,9 @@ static tree rs6000_handle_longcall_attribute (tree *, > tree, tree, int, bool *); > static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool > *); > static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *); > static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree); > +static HOST_WIDE_INT rs6000_rotate_long_const (unsigned HOST_WIDE_INT, int > *); > +static HOST_WIDE_INT rs6000_peel_long_const (unsigned HOST_WIDE_INT, int *, > + int *); > static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT); > static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool); > static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, > bool); > @@ -5868,12 +5871,28 @@ num_insns_constant_gpr (HOST_WIDE_INT value) > >else if (TARGET_POWERPC64) > { > + int rotate, head, tail; > + HOST_WIDE_INT imm1, imm2; > + unsigned HOST_WIDE_INT uc = value; >HOST_WIDE_INT low = ((value & 0x) ^ 0x8000) - 0x8000; >HOST_WIDE_INT high = value >> 31; > >if (high == 0 || high == -1) > return 2; > > + /* A long constant can be transformed from both a 16 bit constant and > + a 32 bit constant. So we first test imm1 and imm2 if they're 16 > + bit. */ > + imm1 = rs6000_rotate_long_const (uc, &rotate); > + if (SIGNED_INTEGER_16BIT_P (imm1)) > + return 2; > + imm2 = rs6000_peel_long_const (uc, &head, &tail); > + if (SIGNED_INTEGER_16BIT_P (imm2)) > + return 2; > + if (SIGNED_INTEGER_NBIT_P (imm1, 32) > + || SIGNED_INTEGER_NBIT_P (imm2, 32)) > + return 3; > + Ok. >high >>= 1; > >if (low == 0 || low == high) > @@ -9720,6 +9739,96 @@ rs6000_emit_set_const (rtx dest, rtx source) >return true; > } > > +/* Function to load 32 a bit constant. */ > +static void > +rs6000_emit_set_32bit_const (rtx dest, HOST_WIDE_INT c) > +{ > + gcc_assert (SIGNED_INTEGER_NBIT_P (c, 32)); > + > + rtx temp = can_create_pseudo_p () ? gen_reg_rtx (DImode) : dest; > + > + if (SIGNED_INTEGER_16BIT_P (c)) > +emit_insn (gen_rtx_SET (dest, GEN_INT (c))); > + else > +{ > + emit_insn (gen_rtx_SET (copy_rtx (temp), > + GEN_INT (c & ~(HOST_WIDE_INT) 0x))); > + emit_insn (gen_rtx_SET (dest, > + gen_rtx_IOR (DImode, copy_rtx (temp), > +GEN_INT (c & 0x; > +} > +} ok > + > +/* Helper function of rs6000_emit_set_long_const to left rotate a long > + constant. It returns the result immediately when it finds a 32 bit > + constant. It at most rotates for 31 bits. > + For instant, the constant 0x1234 can be transformed to > + a 32 bit constant 0x4123 by left rotating 12 bits. */ > +static HOST_WIDE_INT > +rs6000_rotate_long_const (unsigned HOST_WIDE_INT c, int *rot) > +{ > + int bitsize = GET_MODE_BITSIZE (DImode); > + bool found = false; > + unsigned HOST_WIDE_INT imm = c; > + unsigned HOST_WIDE_INT m = imm >> (bitsize - 1); > + int rotate = 0; > + > + while (rotate < 31 && !found) > +{ > + imm = imm << 1 | m; > + if (clz_hwi (imm) > 32 || clz_hwi (~imm) > 32) > + found = true; > + rotate++; > + m = imm >> (bitsize - 1); > +} > + > + *rot = rotate; > + return imm; > +} ok. > + > +/* Helper function of rs6000_emit_set_long_const to reutrn a constant by return > + removing consecutive 0s and 1s at the head and tail then setting all high > + bits. > + For instance, 0x00fff2345000 can be transformed to 0xfff2345 by > + peeling the head and tail, then to 0x234 by setting all > + high bits. Y
Re: [PATCH] rs6000: Convert the vector element register to SImode [PR98914]
On Wed, 2021-02-03 at 03:01 -0600, Xionghu Luo via Gcc-patches wrote: Hi, > v[k] will also be expanded to IFN VEC_SET if k is long type when > built > with -Og. -O0 didn't exposed the issue due to v is TREE_ADDRESSABLE, > -O1 and above also didn't capture it because of v[k] is not optimized > to > VIEW_CONVERT_EXPR(v)[k_1]. > vec_insert defines the element argument type to be signed int by > ELFv2 > ABI, so convert it to SImode if it wasn't for Power target > requirements. The intro paragraph seems to start mid sentence. Did something get cut off? The description here is specific to the reported testcase failure. This should describe the patch behavior instead. Something like "When expanding a vector with a variable rtx, the rtx type needs to be SI" ... (I defer to any other suggestions of better or improved wording). > > gcc/ChangeLog: Reference "PR target/98914" somewhere in here. > > 2021-02-03 Xionghu Luo > > * config/rs6000/rs6000.c (rs6000_expand_vector_set): Convert > elt_rtx to SImode if it wasn't. s/if it wasn't// > > gcc/testsuite/ChangeLog: > > 2021-02-03 Xionghu Luo > > * gcc.target/powerpc/pr98914.c: New test. > --- > gcc/config/rs6000/rs6000.c | 17 ++--- > gcc/testsuite/gcc.target/powerpc/pr98914.c | 11 +++ > 2 files changed, 21 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr98914.c > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index ec068c58aa5..9f7f8da56c6 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -7000,8 +7000,6 @@ rs6000_expand_vector_set_var_p9 (rtx target, > rtx val, rtx idx) > >gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx)); > > - gcc_assert (GET_MODE (idx) == E_SImode); > - >machine_mode inner_mode = GET_MODE (val); > >rtx tmp = gen_reg_rtx (GET_MODE (idx)); This needs a changelog blurb. > @@ -7047,8 +7045,6 @@ rs6000_expand_vector_set_var_p8 (rtx target, > rtx val, rtx idx) > >gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx)); > > - gcc_assert (GET_MODE (idx) == E_SImode); > - >machine_mode inner_mode = GET_MODE (val); >HOST_WIDE_INT mode_mask = GET_MODE_MASK (inner_mode); Same. > > @@ -7144,7 +7140,7 @@ rs6000_expand_vector_set (rtx target, rtx val, > rtx elt_rtx) >machine_mode mode = GET_MODE (target); >machine_mode inner_mode = GET_MODE_INNER (mode); >rtx reg = gen_reg_rtx (mode); > - rtx mask, mem, x; > + rtx mask, mem, x, elt_si; >int width = GET_MODE_SIZE (inner_mode); >int i; > > @@ -7154,16 +7150,23 @@ rs6000_expand_vector_set (rtx target, rtx > val, rtx elt_rtx) > { >if (!CONST_INT_P (elt_rtx)) > { > + /* elt_rtx should be SImode from ELFv2 ABI. */ > + elt_si = gen_reg_rtx (E_SImode); > + if (GET_MODE (elt_rtx) != E_SImode) > + convert_move (elt_si, elt_rtx, 0); > + else > + elt_si = elt_rtx; > + ok. > /* For V2DI/V2DF, could leverage the P9 version to generate > xxpermdi >when elt_rtx is variable. */ > if ((TARGET_P9_VECTOR && TARGET_POWERPC64) || width == 8) > { > - rs6000_expand_vector_set_var_p9 (target, val, elt_rtx); > + rs6000_expand_vector_set_var_p9 (target, val, elt_si); > return; > } > else if (TARGET_P8_VECTOR && TARGET_DIRECT_MOVE_64BIT) > { > - rs6000_expand_vector_set_var_p8 (target, val, elt_rtx); > + rs6000_expand_vector_set_var_p8 (target, val, elt_si); > return; > } > } > diff --git a/gcc/testsuite/gcc.target/powerpc/pr98914.c > b/gcc/testsuite/gcc.target/powerpc/pr98914.c > new file mode 100644 > index 000..e4d78e3e6b3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr98914.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-options "-Og -mvsx" } */ > + > +vector int > +foo (vector int v) > +{ > + for (long k = 0; k < 1; ++k) > +v[k] = 0; > + return v; > +} ok thanks -Will
Re: [PATCH] rs6000: Use rldimi for vec init instead of shift + ior
On Wed, 2021-02-03 at 14:37 +0800, Kewen.Lin via Gcc-patches wrote: > Hi, > Hi, > This patch merges the previously approved one[1] and its relied patch I don't see the review for [1] in the archives. > made by Segher here[2], it's to make unsigned int vector init go with > rldimi to merge two integers instead of shift and ior. > > Segher's patch in [2] is required to make the test case pass, > otherwise the costing for new pseudo-to-pseudo copies and the folding > with nonzero_bits in combine will make the rl*imi pattern become > compact and split into ior and shift unexpectedly. > > The commit log of Segher's patch describes it in more details: > > "An rl*imi is usually written as an IOR of an ASHIFT or similar, and an > AND of a register with a constant mask. In some cases combine knows > that that AND doesn't do anything (because all zero bits in that mask > correspond to bits known to be already zero), and then no pattern > matches. This patch adds a define_split for such cases. It uses > nonzero_bits in the condition of the splitter, but does not need it > afterwards for the instruction to be recognised. This is necessary > because later passes can see fewer nonzero_bits. > > Because it is a splitter, combine will only use it when starting with > three insns (or more), even though the result is just one. This isn't > a huge problem in practice, but some possible combinations still won't > happen." > > Bootstrapped/regtested on powerpc64le-linux-gnu P9 and > powerpc64-linux-gnu P8, also SPEC2017 build/run passed on P9. > > Is it ok for trunk? > > BR, > Kewen > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html > [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563526.html > > > gcc/ChangeLog: > > 2020-02-03 Segher Boessenkool > Kewen Lin > > * config/rs6000/rs6000.md (*rotl3_insert_3): Renamed to... > (rotl3_insert_3): ...this. ok > (plus_ior_xor): New code_iterator. > (define_split for GPR rl*imi): New splitter. As described above, these two changes appear to identical to what was posted by Segher in [1]. ( > * config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3 > for integer merging. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/vec-init-10.c: New test. +/* { dg-final { scan-assembler-not "or" } } */ As is, it looks OK to me. Per other reviews i've gotten, you may get a request to wrap the "or" with \m \M . Some existing cases with scan-assembler-not have a leading whitespace/tab qualifier too. i.e. /* { dg-final { scan-assembler-not "\[ \t\]or " } } */ Thanks, -Will > > - >
Re: [PATCH 04/34] rs6000: Add VSX builtins
On Thu, 2021-07-29 at 08:30 -0500, Bill Schmidt wrote: > 2021-06-07 Bill Schmidt > Hi, > gcc/ > * config/rs6000/rs6000-builtin-new.def: Add vsx stanza. > --- > gcc/config/rs6000/rs6000-builtin-new.def | 857 +++ > 1 file changed, 857 insertions(+) > ok > diff --git a/gcc/config/rs6000/rs6000-builtin-new.def > b/gcc/config/rs6000/rs6000-builtin-new.def > index f1aa5529cdd..974cdc8c37c 100644 > --- a/gcc/config/rs6000/rs6000-builtin-new.def > +++ b/gcc/config/rs6000/rs6000-builtin-new.def > @@ -1028,3 +1028,860 @@ > >const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>); > VEC_SET_V8HI nothing {set} > + > + > +; VSX builtins. > +[vsx] > + pure vd __builtin_altivec_lvx_v2df (signed long, const void *); > +LVX_V2DF altivec_lvx_v2df {ldvec} > + > + pure vsll __builtin_altivec_lvx_v2di (signed long, const void *); > +LVX_V2DI altivec_lvx_v2di {ldvec} > + > + pure vd __builtin_altivec_lvxl_v2df (signed long, const void *); > +LVXL_V2DF altivec_lvxl_v2df {ldvec} > + > + pure vsll __builtin_altivec_lvxl_v2di (signed long, const void *); > +LVXL_V2DI altivec_lvxl_v2di {ldvec} > + > + const vd __builtin_altivec_nabs_v2df (vd); > +NABS_V2DF vsx_nabsv2df2 {} > + > + const vsll __builtin_altivec_nabs_v2di (vsll); > +NABS_V2DI nabsv2di2 {} > + > + void __builtin_altivec_stvx_v2df (vd, signed long, void *); > +STVX_V2DF altivec_stvx_v2df {stvec} > + > + void __builtin_altivec_stvx_v2di (vsll, signed long, void *); > +STVX_V2DI altivec_stvx_v2di {stvec} > + > + void __builtin_altivec_stvxl_v2df (vd, signed long, void *); > +STVXL_V2DF altivec_stvxl_v2df {stvec} > + > + void __builtin_altivec_stvxl_v2di (vsll, signed long, void *); > +STVXL_V2DI altivec_stvxl_v2di {stvec} > + > + const vd __builtin_altivec_vand_v2df (vd, vd); > +VAND_V2DF andv2df3 {} > + > + const vsll __builtin_altivec_vand_v2di (vsll, vsll); > +VAND_V2DI andv2di3 {} > + > + const vull __builtin_altivec_vand_v2di_uns (vull, vull); > +VAND_V2DI_UNS andv2di3 {} > + > + const vd __builtin_altivec_vandc_v2df (vd, vd); > +VANDC_V2DF andcv2df3 {} > + > + const vsll __builtin_altivec_vandc_v2di (vsll, vsll); > +VANDC_V2DI andcv2di3 {} > + > + const vull __builtin_altivec_vandc_v2di_uns (vull, vull); > +VANDC_V2DI_UNS andcv2di3 {} > + > + const vsll __builtin_altivec_vcmpequd (vull, vull); > +VCMPEQUD vector_eqv2di {} > + > + const int __builtin_altivec_vcmpequd_p (int, vsll, vsll); > +VCMPEQUD_P vector_eq_v2di_p {pred} > + > + const vsll __builtin_altivec_vcmpgtsd (vsll, vsll); > +VCMPGTSD vector_gtv2di {} > + > + const int __builtin_altivec_vcmpgtsd_p (int, vsll, vsll); > +VCMPGTSD_P vector_gt_v2di_p {pred} > + > + const vsll __builtin_altivec_vcmpgtud (vull, vull); > +VCMPGTUD vector_gtuv2di {} > + > + const int __builtin_altivec_vcmpgtud_p (int, vsll, vsll); > +VCMPGTUD_P vector_gtu_v2di_p {pred} > + > + const vd __builtin_altivec_vnor_v2df (vd, vd); > +VNOR_V2DF norv2df3 {} > + > + const vsll __builtin_altivec_vnor_v2di (vsll, vsll); > +VNOR_V2DI norv2di3 {} > + > + const vull __builtin_altivec_vnor_v2di_uns (vull, vull); > +VNOR_V2DI_UNS norv2di3 {} > + > + const vd __builtin_altivec_vor_v2df (vd, vd); > +VOR_V2DF iorv2df3 {} > + > + const vsll __builtin_altivec_vor_v2di (vsll, vsll); > +VOR_V2DI iorv2di3 {} > + > + const vull __builtin_altivec_vor_v2di_uns (vull, vull); > +VOR_V2DI_UNS iorv2di3 {} > + > + const vd __builtin_altivec_vperm_2df (vd, vd, vuc); > +VPERM_2DF altivec_vperm_v2df {} > + > + const vsll __builtin_altivec_vperm_2di (vsll, vsll, vuc); > +VPERM_2DI altivec_vperm_v2di {} > + > + const vull __builtin_altivec_vperm_2di_uns (vull, vull, vuc); > +VPERM_2DI_UNS altivec_vperm_v2di_uns {} > + > + const vd __builtin_altivec_vreve_v2df (vd); > +VREVE_V2DF altivec_vrevev2df2 {} > + > + const vsll __builtin_altivec_vreve_v2di (vsll); > +VREVE_V2DI altivec_vrevev2di2 {} > + > + const vd __builtin_altivec_vsel_2df (vd, vd, vd); > +VSEL_2DF vector_select_v2df {} > + > + const vsll __builtin_altivec_vsel_2di (vsll, vsll, vsll); > +VSEL_2DI_B vector_select_v2di {} > + > + const vull __builtin_altivec_vsel_2di_uns (vull, vull, vull); > +VSEL_2DI_UNS vector_select_v2di_uns {} > + > + const vd __builtin_altivec_vsldoi_2df (vd, vd, const int<4>); > +VSLDOI_2DF altivec_vsldoi_v2df {} > + > + const vsll __builtin_altivec_vsldoi_2di (vsll, vsll, const int<4>); > +VSLDOI_2DI altivec_vsldoi_v2di {} > + > + const vd __builtin_altivec_vxor_v2df (vd, vd); > +VXOR_V2DF xorv2df3 {} > + > + const vsll __builtin_altivec_vxor_v2di (vsll, vsll); > +VXOR_V2DI xorv2di3 {} > + > + const vull __builtin_altivec_vxor_v2di_uns (vull, vull); > +VXOR_V2DI_UNS xorv2di3 {} > + > + const signed __int128 __builtin_vec_ext_v1ti (vsq, signed int); > +VEC_EXT_V1TI nothing {extract} > + > + const double __builtin_vec_ex
Re: [PATCH 05/34] rs6000: Add available-everywhere and ancient builtins
On Thu, 2021-07-29 at 08:30 -0500, Bill Schmidt wrote: > 2021-06-07 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-builtin-new.def: Add always, power5, and > power6 stanzas. > --- > gcc/config/rs6000/rs6000-builtin-new.def | 72 > 1 file changed, 72 insertions(+) > > diff --git a/gcc/config/rs6000/rs6000-builtin-new.def > b/gcc/config/rs6000/rs6000-builtin-new.def > index 974cdc8c37c..ca694be1ac3 100644 > --- a/gcc/config/rs6000/rs6000-builtin-new.def > +++ b/gcc/config/rs6000/rs6000-builtin-new.def > @@ -184,6 +184,78 @@ > > > > +; Builtins that have been around since time immemorial or are just > +; considered available everywhere. > +[always] > + void __builtin_cpu_init (); > +CPU_INIT nothing {cpu} > + > + bool __builtin_cpu_is (string); > +CPU_IS nothing {cpu} > + > + bool __builtin_cpu_supports (string); > +CPU_SUPPORTS nothing {cpu} > + > + unsigned long long __builtin_ppc_get_timebase (); > +GET_TB rs6000_get_timebase {} > + > + double __builtin_mffs (); > +MFFS rs6000_mffs {} > + > +; This will break for long double == _Float128. libgcc history. Add a few more words to provide bigger hints for future archeological digs? (This is perhaps an obvious issue, but I'd need to do some spelunking) I see similar comments below, maybe just a wordier comment for the first occurance. Unsure... > + const long double __builtin_pack_longdouble (double, double); > +PACK_TF packtf {} > + > + unsigned long __builtin_ppc_mftb (); > +MFTB rs6000_mftb_di {32bit} > + > + void __builtin_mtfsb0 (const int<5>); > +MTFSB0 rs6000_mtfsb0 {} > + > + void __builtin_mtfsb1 (const int<5>); > +MTFSB1 rs6000_mtfsb1 {} > + > + void __builtin_mtfsf (const int<8>, double); > +MTFSF rs6000_mtfsf {} > + > + const __ibm128 __builtin_pack_ibm128 (double, double); > +PACK_IF packif {} > + > + void __builtin_set_fpscr_rn (const int[0,3]); > +SET_FPSCR_RN rs6000_set_fpscr_rn {} > + > + const double __builtin_unpack_ibm128 (__ibm128, const int<1>); > +UNPACK_IF unpackif {} > + > +; This will break for long double == _Float128. libgcc history. > + const double __builtin_unpack_longdouble (long double, const int<1>); > +UNPACK_TF unpacktf {} > + > + > +; Builtins that have been around just about forever, but not quite. > +[power5] > + fpmath double __builtin_recipdiv (double, double); > +RECIP recipdf3 {} > + > + fpmath float __builtin_recipdivf (float, float); > +RECIPF recipsf3 {} > + > + fpmath double __builtin_rsqrt (double); > +RSQRT rsqrtdf2 {} > + > + fpmath float __builtin_rsqrtf (float); > +RSQRTF rsqrtsf2 {} > + > + > +; Power6 builtins. I see in subsequent patches you also call out the ISA version in the comment. so perhaps ; Power6 builtins (ISA 2.05). Similar comment for Power5 reference above. > +[power6] > + const signed long __builtin_p6_cmpb (signed long, signed long); > +CMPB cmpbdi3 {} > + > + const signed int __builtin_p6_cmpb_32 (signed int, signed int); > +CMPB_32 cmpbsi3 {} > + > + ok. > ; AltiVec builtins. > [altivec] >const vsc __builtin_altivec_abs_v16qi (vsc);
Re: [PATCH 06/34] rs6000: Add power7 and power7-64 builtins
On Thu, 2021-07-29 at 08:30 -0500, Bill Schmidt wrote: > 2021-04-02 Bill Schmidt > Hi, > gcc/ > * config/rs6000/rs6000-builtin-new.def: Add power7 and power7-64 > stanzas. ok > --- > gcc/config/rs6000/rs6000-builtin-new.def | 39 > 1 file changed, 39 insertions(+) > > diff --git a/gcc/config/rs6000/rs6000-builtin-new.def > b/gcc/config/rs6000/rs6000-builtin-new.def > index ca694be1ac3..bffce52ee47 100644 > --- a/gcc/config/rs6000/rs6000-builtin-new.def > +++ b/gcc/config/rs6000/rs6000-builtin-new.def > @@ -1957,3 +1957,42 @@ > >const vsll __builtin_vsx_xxspltd_2di (vsll, const int<1>); > XXSPLTD_V2DI vsx_xxspltd_v2di {} > + > + > +; Power7 builtins (ISA 2.06). > +[power7] > + const unsigned int __builtin_addg6s (unsigned int, unsigned int); > +ADDG6S addg6s {} Add all of the sixes... (ok). > + > + const signed long __builtin_bpermd (signed long, signed long); > +BPERMD bpermd_di {} > + > + const unsigned int __builtin_cbcdtd (unsigned int); > +CBCDTD cbcdtd {} > + > + const unsigned int __builtin_cdtbcd (unsigned int); > +CDTBCD cdtbcd {} > + > + const signed int __builtin_divwe (signed int, signed int); > +DIVWE dive_si {} > + > + const unsigned int __builtin_divweu (unsigned int, unsigned int); > +DIVWEU diveu_si {} > + > + const vsq __builtin_pack_vector_int128 (unsigned long long, unsigned long > long); > +PACK_V1TI packv1ti {} > + > + void __builtin_ppc_speculation_barrier (); > +SPECBARR speculation_barrier {} > + > + const unsigned long __builtin_unpack_vector_int128 (vsq, const int<1>); > +UNPACK_V1TI unpackv1ti {} > + > + > +; Power7 builtins requiring 64-bit GPRs (even with 32-bit addressing). > +[power7-64] > + const signed long long __builtin_divde (signed long long, signed long > long); > +DIVDE dive_di {} > + > + const unsigned long long __builtin_divdeu (unsigned long long, unsigned > long long); > +DIVDEU diveu_di {} ok thanks -Will
Re: [PATCH] Generate XXSPLTIDP on power10.
On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote: > Generate XXSPLTIDP on power10. > > This patch implements XXSPLTIDP support for SF and DF scalar constants and > V2DF > vector constants. The XXSPLTIDP instruction is given a 32-bit immediate that > is converted to a vector of two DFmode constants. The immediate is in SFmode > format, so only constants that fit as SFmode values can be loaded with > XXSPLTIDP. ok > > I added a new constraint (eF) to match constants that can be loaded with the > XXSPLTIDP instruction. > > I have added a temporary switch (-mxxspltidp) to control whether or not the > XXSPLTIDP instruction is generated. How temporary? > > I added 3 new tests to test loading up SF/DF scalar and V2DF vector > constants. > > I have tested this with bootstrap compilers on power10 systems and there was > no > regression. I have built GCC with these patches on little endian power9 and > big endian power8 systems, and there were no regressions. > > In addition, I have built and run the full Spec 2017 rate suite, comparing > with > the patches enabled and not enabled. There were roughly 66,000 XXSPLTIDP's > generated in the rate build for Spec 2017. On a stand-alone system that is > running single threaded, blender_r has a 1.9% increase in performance, and > rest > of the benchmarks are performance neutral. However, I would expect that in a > real world scenario, switching to use XXSPLTIDP will increase performance due > to removing all of the loads. ok > > Can I check this into the master branch? > > 2021-08-25 Michael Meissner > > gcc/ > * config/rs6000/constraints.md (eF): New constraint. > * config/rs6000/predicates.md (easy_fp_constant): If we can load > the scalar constant with XXSPLTIDP, the floating point constant is > easy. Could be shortened to something like ? Add clause to accept xxspltidp_operand as easy. > (xxspltidp_operand): New predicate. Will there ever be another instruction using the SF/DF CONST_DOUBLE or V2DF CONST_VECTOR ? I tentatively question the name of the operand, but defer.. > (easy_vector_constant): If we can generate XXSPLTIDP, mark the > vector constant as easy. Duplicated from above. > * config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New > declaration. > (prefixed_permute_p): Likewise. > * config/rs6000/rs6000.c (xxspltidp_constant_p): New function. > (output_vec_const_move): Add support for XXSPLTIDP. > (prefixed_permute_p): New function. Duplicated. > * config/rs6000/rs6000.md (prefixed attribute): Add support for > permute prefixed instructions. > (movsf_hardfloat): Add XXSPLTIDP support. > (mov_hardfloat32, FMOVE64 iterator): Likewise. > (mov_hardfloat64, FMOVE64 iterator): Likewise. > * config/rs6000/rs6000.opt (-mxxspltidp): New switch. > * config/rs6000/vsx.md (vsx_move_64bit): Add XXSPLTIDP > support. > (vsx_move_32bit): Likewise. No e in mov (per patch contents below). > (vsx_splat_v2df_xxspltidp): New insn. > (XXSPLTIDP): New mode iterator. > (xxspltidp__internal): New insn and splits. > (xxspltidp__inst): Replace xxspltidp_v2df_inst with an > iterated form that also does SFmode, and DFmode. Swap "an iterated form" with "xxspltidp__inst ? > > gcc/testsuite/ > * gcc.target/powerpc/vec-splat-constant-sf.c: New test. > * gcc.target/powerpc/vec-splat-constant-df.c: New test. > * gcc.target/powerpc/vec-splat-constant-v2df.c: New test. > --- > gcc/config/rs6000/constraints.md | 5 + > gcc/config/rs6000/predicates.md | 17 +++ > gcc/config/rs6000/rs6000-protos.h | 2 + > gcc/config/rs6000/rs6000.c| 106 ++ > gcc/config/rs6000/rs6000.md | 45 +--- > gcc/config/rs6000/rs6000.opt | 4 + > gcc/config/rs6000/vsx.md | 64 ++- > .../powerpc/vec-splat-constant-df.c | 60 ++ > .../powerpc/vec-splat-constant-sf.c | 60 ++ > .../powerpc/vec-splat-constant-v2df.c | 64 +++ > 10 files changed, 405 insertions(+), 22 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c > > diff --git a/gcc/config/rs6000/constraints.md > b/gcc/config/rs6000/constraints.md > index c8cff1a3038..ea2e4a267c3 100644 > --- a/gcc/config/rs6000/constraints.md > +++ b/gcc/config/rs6000/constraints.md > @@ -208,6 +208,11 @@ (define_constraint "P" >(and (match_code "const_int") > (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < > 0x1"))) > > +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP > +(define_constra
Re: [RS6000] rs6000_rtx_costs for AND
On Tue, 2020-09-15 at 10:49 +0930, Alan Modra via Gcc-patches wrote: > The existing "case AND" in this function is not sufficient for > optabs.c:avoid_expensive_constant usage, where the AND is passed in > outer_code. > > * config/rs6000/rs6000.c (rs6000_rtx_costs): Move costing for > AND to CONST_INT case. > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 32044d33977..523d029800a 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -21150,16 +21150,13 @@ rs6000_rtx_costs (rtx x, machine_mode mode, > int outer_code, > || outer_code == MINUS) > && (satisfies_constraint_I (x) > || satisfies_constraint_L (x))) > - || (outer_code == AND > - && (satisfies_constraint_K (x) > - || (mode == SImode > - ? satisfies_constraint_L (x) > - : satisfies_constraint_J (x > - || ((outer_code == IOR || outer_code == XOR) > + || ((outer_code == AND || outer_code == IOR || outer_code == > XOR) > && (satisfies_constraint_K (x) > || (mode == SImode > ? satisfies_constraint_L (x) > : satisfies_constraint_J (x > + || (outer_code == AND > + && rs6000_is_valid_and_mask (x, mode)) > || outer_code == ASHIFT > || outer_code == ASHIFTRT > || outer_code == LSHIFTRT > @@ -21196,7 +21193,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, > int outer_code, > || outer_code == IOR > || outer_code == XOR) > && (INTVAL (x) > -& ~ (unsigned HOST_WIDE_INT) 0x) == 0)) > +& ~ (unsigned HOST_WIDE_INT) 0x) == 0) > +|| (outer_code == AND > +&& rs6000_is_valid_2insn_and (x, mode))) > { > *total = COSTS_N_INSNS (1); > return true; > @@ -21334,26 +21333,6 @@ rs6000_rtx_costs (rtx x, machine_mode mode, > int outer_code, > *total += COSTS_N_INSNS (1); > return true; > } > - > - /* rotate-and-mask (no rotate), andi., andis.: 1 insn. */ > - HOST_WIDE_INT val = INTVAL (XEXP (x, 1)); > - if (rs6000_is_valid_and_mask (XEXP (x, 1), mode) > - || (val & 0x) == val > - || (val & 0x) == val > - || ((val & 0x) == 0 && mode == SImode)) > - { > - *total = rtx_cost (left, mode, AND, 0, speed); > - *total += COSTS_N_INSNS (1); > - return true; > - } > - > - /* 2 insns. */ > - if (rs6000_is_valid_2insn_and (XEXP (x, 1), mode)) > - { > - *total = rtx_cost (left, mode, AND, 0, speed); > - *total += COSTS_N_INSNS (2); > - return true; > - } > } It's not exactly 1x1.. I tentatively conclude that the /* rotate-and- mask */ lump of code here does go dead with the "case AND" changes above. thanks -Will > >*total = COSTS_N_INSNS (1);
Re: [RS6000] rtx_costs
On Tue, 2020-09-15 at 10:49 +0930, Alan Modra via Gcc-patches wrote: > This patch series fixes a number of issues in rs6000_rtx_costs, the > aim being to provide costing somewhat closer to reality. Probably > the > most important patch of the series is patch 4, which just adds a > comment. Without the analysis that went into that comment, I found > myself making what seemed to be good changes but which introduced > regressions. > > So far these changes have not introduced any testsuite regressions > on --with-cpu=power8 and --with-cpu=power9 all lang bootstraps on > powerpc64le-linux. Pat spec tested on power9 against a baseline > master from a few months ago, seeing a few small improvements and no > degradations above the noise. I've read through all the patches in this series, (including the tests that were sent a bit later). Your use of comments does a good job helping describe whats going on. One comment/question/point of clarity for the AND patch that I'll send separately. That said, the series lgtm. :-) thanks, -Will > > Some notes: > > Examination of varasm.o shows quite a number of cases where > if-conversion succeeds due to different seq_cost. One example: > > extern int foo (); > int > default_assemble_integer (unsigned size) > { > extern unsigned long rs6000_isa_flags; > > if (size > (!((rs6000_isa_flags & (1UL << 35)) != 0) ? 4 : 8)) > return 0; > return foo (); > } > > This rather horrible code turns the rs6000_isa_flags value into > either > 4 or 8: > rldicr 9,9,28,0 > srdi 9,9,28 > addic 9,9,-1 > subfe 9,9,9 > rldicr 9,9,0,61 > addi 9,9,8 > Better would be > rldicl 9,9,29,63 > sldi 9,9,2 > addi 9,9,4 > > There is also a "rlwinm ra,rb,3,0,26" instead of "rldicr ra,rb,3,60", > and "li r31,0x4000; rotldi r31,r31,17" vs. > "lis r31,0x8000; clrldi r31,r31,32". > Neither of these is a real change. I saw one occurrence of a 5 insn > sequence being replaced with a load from memory in > default_function_rodata_section, for ".rodata", and others elsewhere. > > Sometimes correct insn cost leads to unexpected results. For > example: > > extern unsigned bar (void); > unsigned > f1 (unsigned a) > { > if ((a & 0x01000200) == 0x01000200) > return bar (); > return 0; > } > > emits for a & 0x01000200 > (set (reg) (and (reg) (const_int 0x01000200))) > at expand time (two rlwinm insns) rather than the older > (set (reg) (const_int 0x01000200)) > (set (reg) (and (reg) (reg))) > which is three insns. However, since 0x01000200 is needed later the > older code after optimisation is smaller.
Re: [Patch 5/5] rs6000, Conversions between 128-bit integer and floating point values.
On Tue, 2020-08-11 at 12:23 -0700, Carl Love wrote: > Segher, Will: > > Patch 5 adds the 128-bit integer to/from 128-floating point > conversions. This patch has to invoke the routines to use the 128-bit > hardware instructions if on Power 10 or use software routines if > running on a pre Power 10 system via the resolve function. > > Carl Some mostly cosmetic bits below. Thanks -Will > > --- > Conversions between 128-bit integer and floating point values. > > gcc/ChangeLog > > 2020-08-10 Carl Love > config/rs6000/rs6000.md (floatunsti2, > fix_truncti2, fixuns_truncti2): Add > define_insn for mode IEEE 128. also floatti2 > libgcc/config/rs6000/fixkfi-sw.c: New file. > libgcc/config/rs6000/fixkfi.c: Remove file. Should that be fixkfti-sw.c (missing t)? Adjust to indicate this is a rename libgcc/config/rs6000/fixkfti.c: Rename to libgcc/config/rs6000/fixkfti-sw.c > libgcc/config/rs6000/fixunskfi-sw.c: New file. > libgcc/config/rs6000/fixunskfi.c: Remove file. > libgcc/config/rs6000/float128-hw.c (__floattikf_hw, > __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw): > New functions. > libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1): > New macro. > (__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve, > __fixunskfti_resolve): Add resolve functions. > (__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New > functions. > libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf, > __fixtfti, __fixunstfti): Add editor commands to change > names. > libgcc/config/rs6000/float128-sed-hw (__floattitf, > __floatuntitf, __fixtfti, __fixunstfti): Add editor commands > to change names. > libgcc/config/rs6000/floattikf-sw.c: New file. > libgcc/config/rs6000/floattikf.c: Remove file. > libgcc/config/rs6000/floatuntikf-sw.c: New file. > libgcc/config/rs6000/floatuntikf.c: Remove file. > libgcc/config/rs6000/floatuntikf-sw.c: New file. > libgcc/config/rs6000/quaad-float128.h (__floattikf_sw, > __floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw, > __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf, > __floatuntikf, __fixkfti, __fixunskfti):New extern declarations. no tab. > libgcc/config/rs6000/t-float128 (floattikf, floatuntikf, > fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs. > (floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add > file names to fp128_ppc_funcs. > > gcc/testsuite/ChangeLog > > 2020-08-10 Carl Love > gcc.target/powerpc/fl128_conversions.c: New file. > --- > gcc/config/rs6000/rs6000.md | 36 +++ > .../gcc.target/powerpc/fp128_conversions.c| 287 ++ > .../config/rs6000/{fixkfti.c => fixkfti-sw.c} | 4 +- > .../rs6000/{fixunskfti.c => fixunskfti-sw.c} | 4 +- > libgcc/config/rs6000/float128-hw.c| 24 ++ > libgcc/config/rs6000/float128-ifunc.c | 44 ++- > libgcc/config/rs6000/float128-sed | 4 + > libgcc/config/rs6000/float128-sed-hw | 4 + > .../rs6000/{floattikf.c => floattikf-sw.c}| 4 +- > .../{floatuntikf.c => floatuntikf-sw.c} | 4 +- > libgcc/config/rs6000/quad-float128.h | 17 +- > libgcc/config/rs6000/t-float128 | 3 +- > 12 files changed, 415 insertions(+), 20 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c > rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%) > rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (96%) > rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%) > rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%) > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 43b620ae1c0..3853ebd4195 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -6390,6 +6390,42 @@ > xscvsxddp %x0,%x1" >[(set_attr "type" "fp")]) > > +(define_insn "floatti2" > + [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v") > + (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > +{ > + return "xscvsqqp %0,%1"; > +} > + [(set_attr "type" "fp")]) > + > +(define_insn "floatunsti2" > + [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v") > + (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" > "v")))] > + "TARGET_POWER10" > +{ > + return "xscvuqqp %0,%1"; > +} > + [(set_attr "type" "fp")]) > + > +(define_insn "fix_truncti2" > + [(set (match_operand:TI 0 "vsx_register_operand" "=v") > + (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > +{ > + return "xscvqpsqz %0,%1"; > +} > + [(set
[PATCH 1/2, rs6000] int128 sign extention instructions (partial prereq)
[PATCH, rs6000] int128 sign extention instructions (partial prereq) Hi This is a sub-set of the 128-bit sign extension support patch series that I believe will be fully implemented in a subsequent patch from Carl. This is a necessary pre-requisite for the vector-load/store rightmost element patch that follows in this thread. Thanks, -Will gcc/ChangeLog: * config/rs6000/rs6000.md (enum c_enum): Add UNSPEC_EXTENDDITI2 and UNSPEC_MTVSRD_DITI_W1 entries. (mtvsrdd_diti_w1, extendditi2_vector): New define_insns. (extendditi2): New define_expand. diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 9c5a228..7d0b296 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -150,10 +150,12 @@ UNSPEC_PLTSEQ UNSPEC_PLT16_HA UNSPEC_CFUGED UNSPEC_CNTLZDM UNSPEC_CNTTZDM + UNSPEC_EXTENDDITI2 + UNSPEC_MTVSRD_DITI_W1 UNSPEC_PDEPD UNSPEC_PEXTD ]) ;; @@ -963,10 +965,41 @@ "" [(set_attr "type" "shift") (set_attr "dot" "yes") (set_attr "length" "4,8")]) +;; Move DI value from GPR to TI mode in VSX register, word 1. +(define_insn "mtvsrdd_diti_w1" + [(set (match_operand:TI 0 "register_operand" "=wa") + (unspec:TI [(match_operand:DI 1 "register_operand" "r")] + UNSPEC_MTVSRD_DITI_W1))] + "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" + "mtvsrdd %x0,0,%1" + [(set_attr "type" "vecsimple")]) + +;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg +(define_insn "extendditi2_vector" + [(set (match_operand:TI 0 "gpc_reg_operand" "=v") +(unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")] + UNSPEC_EXTENDDITI2))] + "TARGET_POWER10" + "vextsd2q %0,%1" + [(set_attr "type" "exts")]) + +(define_expand "extendditi2" + [(set (match_operand:TI 0 "gpc_reg_operand") +(sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))] + "TARGET_POWER10" + { +/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits */ +rtx temp = gen_reg_rtx (TImode); +emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1])); +emit_insn (gen_extendditi2_vector (operands[0], temp)); +DONE; + } + [(set_attr "type" "exts")]) + (define_insn "extendqi2" [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,?*v") (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r,?*v")))] ""
[PATCH 2/2, rs6000] VSX load/store rightmost element operations
[PATCH 2/2, rs6000] VSX load/store rightmost element operations Hi, This adds support for the VSX load/store rightmost element operations. This includes the instructions lxvrbx, lxvrhx, lxvrwx, lxvrdx, stxvrbx, stxvrhx, stxvrwx, stxvrdx; And the builtins vec_xl_sext() /* vector load sign extend */ vec_xl_zext() /* vector load zero extend */ vec_xst_trunc() /* vector store truncate */. Testcase results show that the instructions added with this patch show up at low/no optimization (-O0), with a number of those being replaced with other load and store instructions at higher optimization levels. For consistency I've left the tests at -O0. Regtested OK for Linux on power8,power9 targets. Sniff-regtested OK on power10 simulator. OK for trunk? Thanks, -Will gcc/ChangeLog: * config/rs6000/altivec.h (vec_xl_zest, vec_xl_sext, vec_xst_trunc): New defines. * config/rs6000/rs6000-builtin.def (BU_P10V_OVERLOAD_X): New builtin macro. (BU_P10V_AV_X): New builtin macro. (se_lxvrhbx, se_lxrbhx, se_lxvrwx, se_lxvrdx): Define internal names for load and sign extend vector element. (ze_lxvrbx, ze_lxvrhx, ze_lxvrwx, ze_lxvrdx): Define internal names for load and zero extend vector element. (tr_stxvrbx, tr_stxvrhx, tr_stxvrwx, tr_stxvrdx): Define internal names for truncate and store vector element. (se_lxvrx, ze_lxvrx, tr_stxvrx): Define internal names for overloaded load/store rightmost element. * config/rs6000/rs6000-call.c (altivec_builtin_types): Define the internal monomorphs P10_BUILTIN_SE_LXVRBX, P10_BUILTIN_SE_LXVRHX, P10_BUILTIN_SE_LXVRWX, P10_BUILTIN_SE_LXVRDX, P10_BUILTIN_ZE_LXVRBX, P10_BUILTIN_ZE_LXVRHX, P10_BUILTIN_ZE_LXVRWX, P10_BUILTIN_ZE_LXVRDX, P10_BUILTIN_TR_STXVRBX, P10_BUILTIN_TR_STXVRHX, P10_BUILTIN_TR_STXVRWX, P10_BUILTIN_TR_STXVRDX, (altivec_expand_lxvr_builtin): New expansion for load element builtins. (altivec_expand_stv_builtin): Update to support truncate and store builtins. (altivec_expand_builtin): Add clases for the load/store rightmost builtins. (altivec_init_builtins): Add def_builtin entries for __builtin_altivec_se_lxvrbx, __builtin_altivec_se_lxvrhx, __builtin_altivec_se_lxvrwx, __builtin_altivec_se_lxvrdx, __builtin_altivec_ze_lxvrbx, __builtin_altivec_ze_lxvrhx, __builtin_altivec_ze_lxvrwx, __builtin_altivec_ze_lxvrdx, __builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrdx, __builtin_vec_se_lxvrx, __builtin_vec_ze_lxvrx, __builtin_vec_tr_stxvrx. * config/rs6000/vsx.md (vsx_lxvrx, vsx_stxvrx, vsx_stxvrx): New define_insn entries. * gcc/doc/extend.texi: Add documentation for vsx_xl_sext, vsx_xl_zext, and vec_xst_trunc. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-load-element-extend-char.c: New test. * gcc.target/powerpc/vsx-load-element-extend-int.c: New test. * gcc.target/powerpc/vsx-load-element-extend-longlong.c: New test. * gcc.target/powerpc/vsx-load-element-extend-short.c: New test. * gcc.target/powerpc/vsx-store-element-truncate-char.c: New test. * gcc.target/powerpc/vsx-store-element-truncate-int.c: New test. * gcc.target/powerpc/vsx-store-element-truncate-longlong.c: New test. * gcc.target/powerpc/vsx-store-element-truncate-short.c: New test. diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 8a2dcda..df10a8c 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -234,10 +234,13 @@ #define vec_lde __builtin_vec_lde #define vec_ldl __builtin_vec_ldl #define vec_lvebx __builtin_vec_lvebx #define vec_lvehx __builtin_vec_lvehx #define vec_lvewx __builtin_vec_lvewx +#define vec_xl_zext __builtin_vec_ze_lxvrx +#define vec_xl_sext __builtin_vec_se_lxvrx +#define vec_xst_trunc __builtin_vec_tr_stxvrx #define vec_neg __builtin_vec_neg #define vec_pmsum_be __builtin_vec_vpmsum #define vec_shasigma_be __builtin_crypto_vshasigma /* Cell only intrinsics. */ #ifdef __PPU__ diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index e91a48d..c481e81 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -1143,10 +1143,18 @@ (RS6000_BTC_ ## ATTR/* ATTR */ \ | RS6000_BTC_BINARY), \ CODE_FOR_ ## ICODE) /* ICODE */ #endif +#define BU_P10V_OVERLOAD_X(ENUM, NAME) \ + RS6000_BUILTIN_X (P10_BUILTIN_VEC_ ## ENUM, /* ENUM */ \ + "__builtin_vec_" NAME, /* NAME */ \ + RS6000_BTM_P10, /* MASK */ \ +
Re: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote: > Segher, Will: > > Patch 1, adds the 128-bit sign extension instruction support and > corresponding builtin support. > > No changes from the previous version. > > The patch has been tested on > > powerpc64le-unknown-linux-gnu (Power 9 LE) > > with no regression errors. > > Fixed the issues in the ChangeLog noted by Will. > > Carl Love > > --- > > gcc/ChangeLog > > 2020-09-21 Carl Love > * config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define > for new builtins. > * config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL): Add > overloaded builtin definitions. > (VSIGNEXTSB2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D): Add builtin > expansions. +VSIGNEXTSH2W > * config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI, > P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions. > * config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di > visible. > * doc/extend.texi: Add documentation for the vec_signexti and > vec_signextll builtins. > > gcc/testsuite/ChangeLog > > 2020-09-21 Carl Love > * gcc.target/powerpc/p9-sign_extend-runnable.c: New test case. > --- > gcc/config/rs6000/altivec.h | 3 + > gcc/config/rs6000/rs6000-builtin.def | 9 ++ > gcc/config/rs6000/rs6000-call.c | 13 ++ > gcc/config/rs6000/vsx.md | 2 +- > gcc/doc/extend.texi | 15 ++ > .../powerpc/p9-sign_extend-runnable.c | 128 ++ > 6 files changed, 169 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c > > diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h > index 8a2dcda0144..acc365612be 100644 > --- a/gcc/config/rs6000/altivec.h > +++ b/gcc/config/rs6000/altivec.h > @@ -494,6 +494,9 @@ > > #define vec_xlx __builtin_vec_vextulx > #define vec_xrx __builtin_vec_vexturx > +#define vec_signexti __builtin_vec_vsignexti > +#define vec_signextll __builtin_vec_vsignextll > + > #endif > > /* Predicates. > diff --git a/gcc/config/rs6000/rs6000-builtin.def > b/gcc/config/rs6000/rs6000-builtin.def > index e91a48ddf5f..4c2e9460949 100644 > --- a/gcc/config/rs6000/rs6000-builtin.def > +++ b/gcc/config/rs6000/rs6000-builtin.def > @@ -2715,6 +2715,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD, "vprtybd") > BU_P9V_OVERLOAD_1 (VPRTYBQ, "vprtybq") > BU_P9V_OVERLOAD_1 (VPRTYBW, "vprtybw") > BU_P9V_OVERLOAD_1 (VPARITY_LSBB, "vparity_lsbb") > +BU_P9V_OVERLOAD_1 (VSIGNEXTI,"vsignexti") > +BU_P9V_OVERLOAD_1 (VSIGNEXTLL, "vsignextll") > > /* 2 argument functions added in ISA 3.0 (power9). */ > BU_P9_2 (CMPRB, "byte_in_range",CONST, cmprb) > @@ -2726,6 +2728,13 @@ BU_P9_OVERLOAD_2 (CMPRB, "byte_in_range") > BU_P9_OVERLOAD_2 (CMPRB2,"byte_in_either_range") > BU_P9_OVERLOAD_2 (CMPEQB,"byte_in_set") > > +/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1. > */ > +BU_P9V_AV_1 (VSIGNEXTSB2W, "vsignextsb2w", CONST, > vsx_sign_extend_qi_v4si) > +BU_P9V_AV_1 (VSIGNEXTSH2W, "vsignextsh2w", CONST, > vsx_sign_extend_hi_v4si) > +BU_P9V_AV_1 (VSIGNEXTSB2D, "vsignextsb2d", CONST, > vsx_sign_extend_qi_v2di) > +BU_P9V_AV_1 (VSIGNEXTSH2D, "vsignextsh2d", CONST, > vsx_sign_extend_hi_v2di) > +BU_P9V_AV_1 (VSIGNEXTSW2D, "vsignextsw2d", CONST, > vsx_sign_extend_si_v2di) > + > /* Builtins for scalar instructions added in ISA 3.1 (power10). */ > BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged) > BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm) > diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c > index a8b520834c7..9e514a01012 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -5527,6 +5527,19 @@ const struct altivec_builtin_types > altivec_overloaded_builtins[] = { > RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, > RS6000_BTI_INTSI, RS6000_BTI_INTSI }, > > + /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 > */ > + { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W, > +RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 }, > + { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W, > +RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 }, > + > + { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D, > +RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 }, > + { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D, > +RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 }, > + { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D, > +RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 }, > + >/* Overloaded built-in functions for ISA3.1 (power10). */ >{ P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB, > RS6000_BTI_V16QI, RS6000
Re: [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote: > Segher, Will: > > Add support for converting to/from 128-bit integers and 128-bit > decimal floating point formats. A more wordy blurb here clarifying what the patch does would be useful. i.e. this adds support for dcffixqq and dctfixqq instructons.. > > The updates from the previous version of the patch: > > Removed stray ";; carll" comment. > > Removed #if 1 and #endif in the test case. > > Replaced TARGET_TI_VECTOR_OPS with POWER10. > > The patch has been tested on > > powerpc64le-unknown-linux-gnu (Power 9 LE) > > with no regression errors. > > The P10 test was run by hand on Mambo. > > > Carl Love > > > --- > > gcc/ChangeLog > > 2020-09-21 Carl Love > * config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns. > ok. Need changelog blurb to reflect the rs6000-call changes. (this may have leaked in from previous or subsequent patch?) > gcc/testsuite/ChangeLog > > 2020-09-21 Carl Love > * gcc.target/powerpc/int_128bit-runnable.c: Update test. ok. > --- > gcc/config/rs6000/dfp.md | 14 + > gcc/config/rs6000/rs6000-call.c | 4 ++ > .../gcc.target/powerpc/int_128bit-runnable.c | 62 +++ > 3 files changed, 80 insertions(+) > > diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md > index 8f822732bac..0e82e315fee 100644 > --- a/gcc/config/rs6000/dfp.md > +++ b/gcc/config/rs6000/dfp.md > @@ -222,6 +222,13 @@ >"dcffixq %0,%1" >[(set_attr "type" "dfp")]) > > +(define_insn "floattitd2" > + [(set (match_operand:TD 0 "gpc_reg_operand" "=d") > + (float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))] > + "TARGET_POWER10" > + "dcffixqq %0,%1" > + [(set_attr "type" "dfp")]) > + > ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer. > ;; This is the first stage of converting it to an integer type. > > @@ -241,6 +248,13 @@ >"TARGET_DFP" >"dctfix %0,%1" >[(set_attr "type" "dfp")]) > + > +(define_insn "fixtdti2" > + [(set (match_operand:TI 0 "gpc_reg_operand" "=v") > + (fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))] > + "TARGET_POWER10" > + "dctfixqq %0,%1" > + [(set_attr "type" "dfp")]) > > ;; Decimal builtin support > > diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c > index e1d9c2e8729..9c50cd3c5a7 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -4967,6 +4967,8 @@ const struct altivec_builtin_types > altivec_overloaded_builtins[] = { > RS6000_BTI_bool_V2DI, 0 }, >{ P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P, > RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 }, > + { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P, > +RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 > }, > >{ P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P, > RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, > @@ -5074,6 +5076,8 @@ const struct altivec_builtin_types > altivec_overloaded_builtins[] = { > RS6000_BTI_bool_V2DI, 0 }, >{ P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P, > RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 }, > + { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P, > +RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 > }, >{ P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P, > RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, >{ P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P, > diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c > b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c > index 85ad544e22b..ec3dcf3dff1 100644 > --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c > +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c > @@ -38,6 +38,7 @@ > #if DEBUG > #include > #include > +#include > > > void print_i128(__int128_t val) > @@ -59,6 +60,13 @@ int main () >__int128_t arg1, result; >__uint128_t uarg2; > > + _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128; > + > + struct conv_t { > +__uint128_t u128; > +_Decimal128 d128; > + } conv, conv2; > + >vector signed long long int vec_arg1_di, vec_arg2_di; >vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di; >vector unsigned long long int vec_uresult_di; > @@ -2249,6 +2257,60 @@ int main () > abort(); > #endif >} > + > + /* DFP to __int128 and __int128 to DFP conversions */ > + /* Can't get printing of DFP values to work. Print the DFP value as an > + unsigned int so we can see the bit patterns. */ > + conv.u128 = 0x2208ULL; > + conv.u128 = (conv.u128 << 64) | 0x4ULL; //DFP bit pattern for integer 4 > + expected_result_dfp128 = conv.d128; > > + arg1 = 4; > + > + conv.d128 = (_Decimal128) arg1
Re: [PATCH 2/5] RS6000 add 128-bit Integer Operations
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote: > Will, Segher: > > Add support for divide, modulo, shift, compare of 128-bit > integers instructions and builtin support. > > The following are the changes from the previous version of the patch. > > The TARGET_TI_VECTOR_OPS was removed per comments for patch 3. Just > using TARGET_POWER10. > > Removed extra comment. > > Note the change > > -#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b))) > +#define > vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c))) > > is a bug fix. Added missing comment to ChangeLog. > > Removed vector_eqv1ti, used eqvv1ti3 instead. > > Test case, put ppc_native_128bit in the dg-require command. > > The patch has been tested on > > powerpc64le-unknown-linux-gnu (Power 9 LE) > > with no regression errors. > > The P10 test was run by hand on Mambo. > > > Carl Love > > > --- > > gcc/ChangeLog > > 2020-09-21 Carl Love > * config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define > for new builtins. > (vec_rlnm): Fix bug in argument generation. If there is delay, this bugfix could (and probably should) be broken out into it's own patch. > * config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD, > UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs. > (altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud, > altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq, > altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm, > altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq, > altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New > define_insn. altivec_vrlqnm, altivec_vrlqmi should be in the define_expand list. > (vec_widen_umult_even_v2di, vec_widen_smult_even_v2di, > vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi, > altivec_vrlqnm): New define_expands. Actually, they are here... so just need to be removed from the define_insn list. :-) > * config/rs6000/rs6000-builtin.def (BU_P10_P, BU_P10_128BIT_1, > BU_P10_128BIT_2, BU_P10_128BIT_3): New macro definitions. Question below. > (VCMPEQUT_P, VCMPGTST_P, VCMPGTUT_P): Add macro expansions. > (VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI, > CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P, > VCMPAET_P): New macro expansions. > (VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ, VSLQ, > VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI, > MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions. > (VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions. > * config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT, > P10_BUILTIN_CMPGE_1TI, P10_BUILTIN_CMPGE_U1TI, > P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPGTST, > P10_BUILTIN_CMPLE_1TI, P10_BUILTIN_VCMPLE_U1TI, > P10_BUILTIN_128BIT_DIV_V1TI, P10_BUILTIN_128BIT_UDIV_V1TI, > P10_BUILTIN_128BIT_VMULESD, P10_BUILTIN_128BIT_VMULEUD, > P10_BUILTIN_128BIT_VMULOSD, P10_BUILTIN_128BIT_VMULOUD, > P10_BUILTIN_VNOR_V1TI, P10_BUILTIN_VNOR_V1TI_UNS, > P10_BUILTIN_128BIT_VRLQ, P10_BUILTIN_128BIT_VRLQMI, > P10_BUILTIN_128BIT_VRLQNM, P10_BUILTIN_128BIT_VSLQ, > P10_BUILTIN_128BIT_VSRQ, P10_BUILTIN_128BIT_VSRAQ, > P10_BUILTIN_VCMPGTUT_P, P10_BUILTIN_VCMPGTST_P, > P10_BUILTIN_VCMPEQUT_P, P10_BUILTIN_VCMPGTUT_P, > P10_BUILTIN_VCMPGTST_P, P10_BUILTIN_CMPNET, > P10_BUILTIN_VCMPNET_P, P10_BUILTIN_VCMPAET_P, > P10_BUILTIN_128BIT_VSIGNEXTSD2Q, P10_BUILTIN_128BIT_DIVES_V1TI, > P10_BUILTIN_128BIT_MODS_V1TI, P10_BUILTIN_128BIT_MODU_V1TI): > New overloaded definitions. Looks like those should (all?) now be P10V_BUILTIN_. > (int_ftype_int_v1ti_v1ti) [P10_BUILTIN_VCMPEQUT, ? That appeasr to be the (rs6000_gimple_fold_builtin) function. > P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI, > P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT, > P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI, > P10_BUILTIN_CMPLE_U1TI, E_V1TImode]: New case statements. Also should be P10V_BUILTIN_ ? I see both P10_BUILTIN_CMPNET and P10V_BUILTIN_CMPNET references in the patch. > (int_ftype_int_v1ti_v1ti) [bool_V1TI_type_node, > int_ftype_int_v1ti_v1ti]: > New assignments. Thats in the (rs6000_init_builtins) function. > (altivec_init_builtins): New E_V1TImode case statement. > (builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD, > P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI, > P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI, > P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements. May need a refresh with respect to the P10_BUILTIN_ vs P10V_BUILTIN_ ... > * config/rs6000/r6000.c (rs6000_option_override
Re: [PATCH 5/5] Conversions between 128-bit integer and floating point values.
On Mon, 2020-09-21 at 16:57 -0700, Carl Love wrote: > Segher, Will: > > Patch 5 adds the 128-bit integer to/from 128-floating point > conversions. This patch has to invoke the routines to use the 128- > bit > hardware instructions if on Power 10 or use software routines if > running on a pre Power 10 system via the resolve function. > > Add ifunc resolves for __fixkfti, __floatuntikf_sw, __fixkfti_swn, > __fixunskfti_sw. > > The following changes were made to the previous version of the > patch: > > Fixed typos in ChangeLog noted by Will. > > Turned off debug in test case. > > Removed extra blank lines, fixed spacing of #else in the test case. > > Added comment to fixunskfti-sw.c about changes made from the original > file fixunskfti.c. > > The patch has been tested on > > powerpc64le-unknown-linux-gnu (Power 9 LE) > > with no regression errors. > > The P10 tests were run by hand on Mambo. > > Carl Love > - > > gcc/ChangeLog > > 2020-09-21 Carl Love > config/rs6000/rs6000.md (floatti2, floatunsti2, > fix_truncti2, fixuns_truncti2): Add > define_insn for mode IEEE 128. ok > libgcc/config/rs6000/fixkfi-sw.c: New file. > libgcc/config/rs6000/fixkfi.c: Remove file. Should that be fixkfti-sw.c (missing t). Adjust to indicate this is a rename libgcc/config/rs6000/fixkfti.c: Rename to libgcc/config/rs6000/fixkfti-sw.c > libgcc/config/rs6000/fixunskfti-sw.c: New file. > libgcc/config/rs6000/fixunskfti.c: Remove file. > libgcc/config/rs6000/float128-hw.c (__floattikf_hw, > __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw): > New functions. > libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1): > New macro. > (__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve, > __fixunskfti_resolve): Add resolve functions. > (__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New > functions. > libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf, > __fixtfti, __fixunstfti): Add editor commands to change > names. > libgcc/config/rs6000/float128-sed-hw (__floattitf, > __floatuntitf, __fixtfti, __fixunstfti): Add editor commands > to change names. > libgcc/config/rs6000/floattikf-sw.c: New file. > libgcc/config/rs6000/floattikf.c: Remove file. > libgcc/config/rs6000/floatuntikf-sw.c: New file. > libgcc/config/rs6000/floatuntikf.c: Remove file. > libgcc/config/rs6000/floatuntikf-sw.c: New file. > libgcc/config/rs6000/quaad-float128.h (__floattikf_sw, > __floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, > __floattikf_hw, > __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf, > __floatuntikf, __fixkfti, __fixunskfti): New extern > declarations. > libgcc/config/rs6000/t-float128 (floattikf, floatuntikf, > fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs. > (floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add > file names to fp128_ppc_funcs. > > gcc/testsuite/ChangeLog > > 2020-09-21 Carl Love > gcc.target/powerpc/fl128_conversions.c: New file. > --- > gcc/config/rs6000/rs6000.md | 36 +++ > .../gcc.target/powerpc/fp128_conversions.c| 286 > ++ > .../config/rs6000/{fixkfti.c => fixkfti-sw.c} | 4 +- > .../rs6000/{fixunskfti.c => fixunskfti-sw.c} | 7 +- > libgcc/config/rs6000/float128-hw.c| 24 ++ > libgcc/config/rs6000/float128-ifunc.c | 44 ++- > libgcc/config/rs6000/float128-sed | 4 + > libgcc/config/rs6000/float128-sed-hw | 4 + > .../rs6000/{floattikf.c => floattikf-sw.c}| 4 +- > .../{floatuntikf.c => floatuntikf-sw.c} | 4 +- > libgcc/config/rs6000/quad-float128.h | 17 +- > libgcc/config/rs6000/t-float128 | 3 +- > 12 files changed, 417 insertions(+), 20 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/powerpc/fp128_conversions.c > rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%) > rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (90%) > rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%) > rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} > (96%) > > diff --git a/gcc/config/rs6000/rs6000.md > b/gcc/config/rs6000/rs6000.md > index 694ff70635e..5db5d0b4505 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -6390,6 +6390,42 @@ > xscvsxddp %x0,%x1" >[(set_attr "type" "fp")]) > > +(define_insn "floatti2" > + [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v") > + (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" > "v")))] > + "TARGET_POWER10" > +{ > + return "xscvsqqp %0,%1"; > +} > + [(set_attr "type" "fp")]) > + > +(define_insn "floatunsti2" > + [(set (match_operand:IEEE128 0 "
Re: [PATCH 4/5] Test 128-bit shifts for just the int128 type.
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote: > Segher, Will: > > Patch 4 adds the vector 128-bit integer shift instruction support for > the V1TI type. > > The following changes were made from the previous version. > > Renamed VSX_TI to VEC_TI, put def in vector.md. Didn't get it > separated into a different patch. > > Reworked the XXSWAPD_V1TI to not use UNSPEC. > > Test suite program cleanups, removed "//" comments that were not > needed. > > The patch has been tested on > > powerpc64le-unknown-linux-gnu (Power 9 LE) > > with no regression errors. > > The P10 test was run by hand on Mambo. > > > Carl Love > > -- > > > gcc/ChangeLog > > 2020-09-21 Carl Love > * config/rs6000/altivec.md (altivec_vslq, altivec_vsrq): > Rename altivec_vslq_, altivec_vsrq_, mode VEC_TI. Nit "Rename to" > * config/rs6000/vector.md (VEC_TI): New mode iterator. > (vashlv1ti3): Change to vashl3, mode VEC_TI. > (vlshrv1ti3): Change to vlshr3, mode VEC_TI. s/Change/Rename to/ 'New' isn't quite right for the mode iterator, since it's renamed from the VSX_TI iterator. perhaps something like * config/rs6000/vector.md (VEC_TI): New name for VSX_TI iterator from vsx.md. > * config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator. > (VSX_TI): Renamed VEC_TI. Just the Remove. VEC_TI doesn't exist in vsx.md. > > gcc/testsuite/ChangeLog > > 2020-09-21 Carl Love > gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, > shift_left > tests. > --- > gcc/config/rs6000/altivec.md | 16 - > gcc/config/rs6000/vector.md | 27 --- > gcc/config/rs6000/vsx.md | 33 +-- > > .../gcc.target/powerpc/int_128bit-runnable.c | 16 +++-- > 4 files changed, 52 insertions(+), 40 deletions(-) > > diff --git a/gcc/config/rs6000/altivec.md > b/gcc/config/rs6000/altivec.md > index 34a4731342a..5db3de3cc9f 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -2219,10 +2219,10 @@ >"vsl %0,%1,%2" >[(set_attr "type" "vecsimple")]) > > -(define_insn "altivec_vslq" > - [(set (match_operand:V1TI 0 "vsx_register_operand" "=v") > - (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v") > - (match_operand:V1TI 2 "vsx_register_operand" > "v")))] > +(define_insn "altivec_vslq_" > + [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v") > + (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" > "v") > + (match_operand:VEC_TI 2 "vsx_register_operand" > "v")))] >"TARGET_POWER10" >/* Shift amount in needs to be in bits[57:63] of 128-bit operand. > */ >"vslq %0,%1,%2" > @@ -2236,10 +2236,10 @@ >"vsr %0,%1,%2" >[(set_attr "type" "vecsimple")]) > > -(define_insn "altivec_vsrq" > - [(set (match_operand:V1TI 0 "vsx_register_operand" "=v") > - (lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" > "v") > -(match_operand:V1TI 2 "vsx_register_operand" > "v")))] > +(define_insn "altivec_vsrq_" > + [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v") > + (lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" > "v") > +(match_operand:VEC_TI 2 > "vsx_register_operand" "v")))] >"TARGET_POWER10" >/* Shift amount in needs to be in bits[57:63] of 128-bit operand. > */ >"vsrq %0,%1,%2" > diff --git a/gcc/config/rs6000/vector.md > b/gcc/config/rs6000/vector.md > index 0cca4232619..3ea3a91845a 100644 > --- a/gcc/config/rs6000/vector.md > +++ b/gcc/config/rs6000/vector.md > @@ -26,6 +26,9 @@ > ;; Vector int modes > (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI]) > > +;; 128-bit int modes > +(define_mode_iterator VEC_TI [V1TI TI]) > + > ;; Vector int modes for parity > (define_mode_iterator VEC_IP [V8HI > V4SI > @@ -1627,17 +1630,17 @@ >"") > > ;; No immediate version of this 128-bit instruction > -(define_expand "vashlv1ti3" > - [(set (match_operand:V1TI 0 "vsx_register_operand" "=v") > - (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v") > - (match_operand:V1TI 2 "vsx_register_operand" > "v")))] > +(define_expand "vashl3" > + [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v") > + (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand") > + (match_operand:VEC_TI 2 > "vsx_register_operand")))] >"TARGET_POWER10" > { >/* Shift amount in needs to be put in bits[57:63] of 128-bit > operand2. */ > - rtx tmp = gen_reg_rtx (V1TImode); > + rtx tmp = gen_reg_rtx (mode); > >emit_insn(gen_xxswapd_v1ti (tmp, operands[2])); > - emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp)); > + emit_insn(gen_altivec_vslq_ (operands[0], operands[1], > tmp)); >DONE; > }) > > @@ -1
[PATCH, rs6000] correct an erroneous BTM value in the BU_P10_MISC define
[PATCH, rs6000] correct an erroneous blip in the BU_P10_MISC define Hi, We have extraneous BTM entry (RS6000_BTM_POWERPC64) in the define for our P10 MISC 2 builtin definition. This does not exist for the '0', '1' or '3' definitions. It appears to me that this was erroneously copied from the P7 version of the define which contains a version of the BU macro both with and without that element. Removing the RS6000_BTM_POWERPC64 portion of the define does not introduce any obvious failures, I believe this extra line can be safely removed. OK for trunk? Thanks -Will diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index e91a48ddf5fe..62c9b77cb76d 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -1112,12 +1112,11 @@ CODE_FOR_ ## ICODE) /* ICODE */ #define BU_P10_MISC_2(ENUM, NAME, ATTR, ICODE) \ RS6000_BUILTIN_2 (P10_BUILTIN_ ## ENUM, /* ENUM */ \ "__builtin_" NAME, /* NAME */ \ - RS6000_BTM_P10 \ - | RS6000_BTM_POWERPC64, /* MASK */ \ + RS6000_BTM_P10, /* MASK */ \ (RS6000_BTC_ ## ATTR/* ATTR */ \ | RS6000_BTC_BINARY), \ CODE_FOR_ ## ICODE) /* ICODE */ #define BU_P10_MISC_3(ENUM, NAME, ATTR, ICODE) \
Re: [EXTERNAL] Re: [PATCH 2/2, rs6000] VSX load/store rightmost element operations
On Thu, 2020-09-24 at 19:40 -0500, Segher Boessenkool wrote: > On Thu, Sep 24, 2020 at 11:04:38AM -0500, will schmidt wrote: > > [PATCH 2/2, rs6000] VSX load/store rightmost element operations > > > > Hi, > > This adds support for the VSX load/store rightmost element > > operations. > > This includes the instructions lxvrbx, lxvrhx, lxvrwx, lxvrdx, > > stxvrbx, stxvrhx, stxvrwx, stxvrdx; And the builtins > > vec_xl_sext() /* vector load sign extend */ > > vec_xl_zext() /* vector load zero extend */ > > vec_xst_trunc() /* vector store truncate */. > > > > Testcase results show that the instructions added with this patch > > show > > up at low/no optimization (-O0), with a number of those being > > replaced > > with other load and store instructions at higher optimization > > levels. > > For consistency I've left the tests at -O0. > > > > Regtested OK for Linux on power8,power9 targets. Sniff-regtested > > OK on > > power10 simulator. > > OK for trunk? > > > > Thanks, > > -Will > > > > gcc/ChangeLog: > > * config/rs6000/altivec.h (vec_xl_zest, vec_xl_sext, > > vec_xst_trunc): New > > defines. > > vec_xl_zext (no humour there :-) ). Lol.. one of them slipped through.. my muscle memory struggled on typing these.. :-) > > > +BU_P10V_OVERLOAD_X (SE_LXVRX, "se_lxvrx") > > +BU_P10V_OVERLOAD_X (ZE_LXVRX, "ze_lxvrx") > > +BU_P10V_OVERLOAD_X (TR_STXVRX, "tr_stxvrx") > > I'm not a fan of the cryptic names. I guess I'll get used to them ;- > ) > > > + if (op0 == const0_rtx) > > + addr = gen_rtx_MEM (blk ? BLKmode : tmode, op1); > > That indent is broken. > > > + else > > + { > > + op0 = copy_to_mode_reg (mode0, op0); > > And so is this. Should be two spaces, not three. > > > + addr = gen_rtx_MEM (blk ? BLKmode : smode, > > + gen_rtx_PLUS (Pmode, op1, op0)); > > "gen_rtx_PLUS" should line up with "blk". > > > + if (sign_extend) > > +{ > > + rtx discratch = gen_reg_rtx (DImode); > > + rtx tiscratch = gen_reg_rtx (TImode); > > More broken indentation. (And more later.) > > > + // emit the lxvr*x insn. > > Use only /* comments */ please, don't mix them. Emit with a capital > E. > > > + pat = GEN_FCN (icode) (tiscratch, addr); > > + if (! pat) > > No space after "!" (or any other unary op other than casts and sizeof > and the like). > > > + // Emit a sign extention from QI,HI,WI to double. > > "extension" willdo, thanks > > > +;; Store rightmost element into store_data > > +;; using stxvrbx, stxvrhx, strvxwx, strvxdx. > > +(define_insn "vsx_stxvrx" > > + [(set > > + (match_operand:INT_ISA3 0 "memory_operand" "=Z") > > + (truncate:INT_ISA3 (match_operand:TI 1 > > "vsx_register_operand" "wa")))] > > + "TARGET_POWER10" > > + "stxvrx %1,%y0" > > %x1 I think? I'll doublecheck. > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend- > > char.c > > @@ -0,0 +1,168 @@ > > +/* > > + * Test of vec_xl_sext and vec_xl_zext (load into rightmost > > + * vector element and zero/sign extend). */ > > + > > +/* { dg-do compile {target power10_ok} } */ > > +/* { dg-do run {target power10_hw} } */ > > +/* { dg-require-effective-target power10_ok } */ > > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ > > If you dg_require it, why test it on the "dg-do compile" line? It > will > *work* with it of course, but it is puzzling :-) I've had both compile-time and run-time versions of the test. In this case I wanted to try to handle both, so compile when I can compile it, and run when I can run it, etc. If that combo doesn't work the way I expect it to, i'll need to split them out into separate tests. > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend- > > int.c > > @@ -0,0 +1,165 @@ > > +/* > > + * Test of vec_xl_sext and vec_xl_zext (load into rightmost > > + * vector element and zero/sign extend). */ > > + > > +/* { dg-do compile {target power10_ok} } */ > > +/* { dg-do run {target power10_hw} } */ > > +/* { dg-require-effective-target power10_ok } */ > > +/* { dg-options "-mdejagnu-cpu=power10 -O0" } */ > > Please comment here what that -O0 is for? So that we still know when > we > read it decades from now ;-) I've got it commented at least once, I'll make sure to get all the instances covered. > > > +/* { dg-final { scan-assembler-times {\mlxvrwx\M} 2 } } */ > > +/* { dg-final { scan-assembler-times {\mlwax\M} 0 } } */ > > Maybe all of {\mlwa} here? lwax was sufficient for what I sniff-tested. I'll double-check. Thanks, -Will > > > Segher
Re: [PATCH, rs6000] correct an erroneous BTM value in the BU_P10_MISC define
On Fri, 2020-09-25 at 12:36 -0500, Segher Boessenkool wrote: > Hi! > > On Thu, Sep 24, 2020 at 03:35:24PM -0500, will schmidt wrote: > > We have extraneous BTM entry (RS6000_BTM_POWERPC64) in the > > define for > > our P10 MISC 2 builtin definition. This does not exist for the > > '0', > > '1' or '3' definitions. It appears to me that this was erroneously > > copied from the P7 version of the define which contains a version > > of the > > BU macro both with and without that element. Removing the > > RS6000_BTM_POWERPC64 portion of the define does not introduce any > > obvious > > failures, I believe this extra line can be safely removed. > > No, it cannot. > > This is used for pdepd/pextd/cntlzdm/cnttzdm/cfuged, all of which do > need 64-bit registers to do anything sane. > > This should really have defined some new builtin class, and I thought > we > could just be tricky and take a massive shortcut. Bill has been hit > by > this already as well, sigh :-( Ok. The usage of that macro seems to be limited to those that you have referenced. i.e. /* Builtins for scalar instructions added in ISA 3.1 (power10). */ BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged) BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm) BU_P10_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm) BU_P10_MISC_2 (PDEPD, "pdepd", CONST, pdepd) BU_P10_MISC_2 (PEXTD, "pextd", CONST, pextd) So looking at the power7 entries that have the BTM_POWERPC64 entry.. BU_P7_MISC_2 (DIVWE,"divwe",CONST, dive_si) BU_P7_MISC_2 (DIVWEU, "divweu", CONST, diveu_si) BU_P7_POWERPC64_MISC_2 (DIVDE, "divde",CONST, dive_di) BU_P7_POWERPC64_MISC_2 (DIVDEU, "divdeu", CONST, diveu_di) Would it be suitable to rename the P10 macro to BU_P10_POWERPC64_MISC_2 ? I'd then debate whether to add a unused macro to fill the gap between BU_P10_MISC_1 and BU_P10_MISC_2 If you've got schemes for a deeper fix, i'd need another hint. :-) thanks -Will > > > Segher
Re: [PATCH v2] builtins: rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]
On Fri, 2020-09-04 at 12:52 -0300, Raoni Fassina Firmino via Gcc-patches wrote: > Changes since v1[1]: > - Fixed english spelling; > - Fixed code-style; > - Changed match operand predicate in feclearexcept and feraiseexcept; > - Changed testcase options; > - Minor changes in test code to be C90 compatible; > - Other minor changes sugested by Segher; > - Changed subject line tag (not sure if I tagged correctly or should > include optabs: also) > > There is one pending question raised by Segher, It is about adding > documentation, I am not sure if it is needed and if so, where it > should be. I will quote the relevant part of the conversation[2] from > the v1 thread for context: > > > > > +OPTAB_D (fegetround_optab, "fegetround$a") > > > > +OPTAB_D (feclearexcept_optab, "feclearexcept$a") > > > > +OPTAB_D (feraiseexcept_optab, "feraiseexcept$a") > > >␣ > > > Should those be documented somewhere? (In gcc/doc/ somewhere). > > > > I am lost on that one. I took a look on the docs (I hope looking on the > > online docs was good enough) and I didn't find a place where i feel it > > sits well. On the PowerPC target specific sections (6.60.22 Basic > > PowerPC Built-in Functions), I didn't found it mentioning builtins that > > are optimizations for the standard library functions, but we do have > > many of these for Power. Then, on the generic section (6.59 Other > > Built-in Functions Provided by GCC) it mentions C99 functions that have > > builtins but it seems like it mentions builtins that have target > > independent implementation, or at least it dos not say that some > > builtins may be implemented on only some targets. And in this case > > there is no implementation (for now) for any other target that is not > > PowerPc. > > > > So, I don't know if or where this should be documented. > > tested on top of master (c5a6c2237a1156dc43fa32c938fc32acdfc2bff9) > on the following plataforms with no regression: > - powerpc64le-linux-gnu (Power 9) > - powerpc64le-linux-gnu (Power 8) > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552024.html > [2] https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553295.html > > 8< > > This optimizations were originally in glibc, but was removed > and sugested that they were a good fit as gcc builtins[1]. > > The associated bugreport: PR target/94193 > > [1] https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00047.html > https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00080.html > > 2020-08-13 Raoni Fassina Firmino > > gcc/ChangeLog: > > * builtins.c (expand_builtin_fegetround): New function. > (expand_builtin_feclear_feraise_except): New function. > (expand_builtin): Add cases for BUILT_IN_FEGETROUND, > BUILT_IN_FECLEAREXCEPT and BUILT_IN_FERAISEEXCEPT > * config/rs6000/rs6000.md (fegetroundsi): New pattern. > (feclearexceptsi): New Pattern. > (feraiseexceptsi): New Pattern. > * optabs.def (fegetround_optab): New optab. > (feclearexcept_optab): New optab. > (feraiseexcept_optab): New optab. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c: New test. > * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c: New test. > * gcc.target/powerpc/builtin-fegetround.c: New test. > A few cosmetic nits below,.. this is perhaps enough activity to get others to take a look. :-) > Signed-off-by: Raoni Fassina Firmino > > FIX: patch_v1 (2) > --- > gcc/builtins.c| 76 ++ > gcc/config/rs6000/rs6000.md | 82 +++ > gcc/optabs.def| 4 + > .../builtin-feclearexcept-feraiseexcept-1.c | 68 + > .../builtin-feclearexcept-feraiseexcept-2.c | 133 ++ > .../gcc.target/powerpc/builtin-fegetround.c | 36 + > 6 files changed, 399 insertions(+) > create mode 100644 > gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c > create mode 100644 > gcc/testsuite/gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-fegetround.c > > diff --git a/gcc/builtins.c b/gcc/builtins.c > index 97f1a184dc6..a6f6141edb7 100644 > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@ -115,6 +115,9 @@ static rtx expand_builtin_mathfn_3 (tree, rtx, rtx); > static rtx expand_builtin_mathfn_ternary (tree, rtx, rtx); > static rtx expand_builtin_interclass_mathfn (tree, rtx); > static rtx expand_builtin_sincos (tree); > +static rtx expand_builtin_fegetround (tree, rtx, machine_mode); > +static rtx expand_builtin_feclear_feraise_except (tree, rtx, machine_mode, > + optab); > static rtx expand_builtin_cexpi (tree, rtx); > static rtx expand_builtin_int_roundingfn (tree, rtx); > static rtx expand_builtin_int_r
Re: [PATCH] rs6000: Fix extraneous characters in the documentation
On Mon, 2020-10-05 at 17:23 -0300, Tulio Magno Quites Machado Filho via Gcc-patches wrote: > Ping? +cc Segher :-) > > Tulio Magno Quites Machado Filho via Gcc-patches > writes: > > > Replace them with a whitespace in order to avoid artifacts in the HTML > > document. > > > > 2020-08-19 Tulio Magno Quites Machado Filho > > > > gcc/ > > * doc/extend.texi (PowerPC Built-in Functions): Replace > > extraneous characters with whitespace. > > --- > > gcc/doc/extend.texi | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > > index bcc251481ca..0c380322280 100644 > > --- a/gcc/doc/extend.texi > > +++ b/gcc/doc/extend.texi > > @@ -21538,10 +21538,10 @@ void amo_stdat_smin (int64_t *, int64_t); > > ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions. > > GCC provides support for these instructions through the following built-in > > functions which are enabled with the @code{-mmma} option. The vec_t type > > -below is defined to be a normal vector unsigned char type. The uint2, > > uint4 > > +below is defined to be a normal vector unsigned char type. The uint2, > > uint4 That looks like a non-breaking space. (ascii c2 a0) so 2e c2 a0 20 becomes 2e 20 20 > > and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants > > -respectively. The compiler will verify that they are constants and that > > -their values are within range. > > +respectively. The compiler will verify that they are constants and that > > +their values are within range. 2e c2 a0 20 becomes 2e 20 20 And drops a trailing whitespace. Those seem reasonable. lgtm Thanks -Will > > > > The built-in functions supported are: > > > > -- > > 2.25.4 > > > >
[PATCH, rs6000] rename BU_P10_MISC_2 define to BU_P10_POWERPC64_MISC_2
Hi, Rename our BU_P10_MISC_2 built-in define macro to be BU_P10_POWERPC64_MISC_2. This more accurately reflects that the macro includes the RS6000_BTM_POWERPC64 entry that is not present in the other BU_P10_MISC macros, and matches the style we used for the P7 equivalent. Should be entirely cosmetic, no codegen changes. A regtest is underway just in case. OK for trunk? Thanks, -Will gcc/ChangeLog: * gcc/config/rs6000/rs6000-builtin.def (BU_P10_MISC_2): Rename to BU_P10_POWERPC64_MISC_2. (CFUGED,CNTLZDM,CNTTZDM,PDEPD,PEXTD): Call renamed macro. diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index e91a48ddf5fe..3eb55f0ae434 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -1109,11 +1109,11 @@ RS6000_BTM_P10, /* MASK */ \ (RS6000_BTC_ ## ATTR/* ATTR */ \ | RS6000_BTC_UNARY), \ CODE_FOR_ ## ICODE) /* ICODE */ -#define BU_P10_MISC_2(ENUM, NAME, ATTR, ICODE) \ +#define BU_P10_POWERPC64_MISC_2(ENUM, NAME, ATTR, ICODE) \ RS6000_BUILTIN_2 (P10_BUILTIN_ ## ENUM, /* ENUM */ \ "__builtin_" NAME, /* NAME */ \ RS6000_BTM_P10 \ | RS6000_BTM_POWERPC64, /* MASK */ \ (RS6000_BTC_ ## ATTR/* ATTR */ \ @@ -2725,15 +2725,15 @@ BU_P9_64BIT_2 (CMPEQB, "byte_in_set", CONST, cmpeqb) BU_P9_OVERLOAD_2 (CMPRB, "byte_in_range") BU_P9_OVERLOAD_2 (CMPRB2, "byte_in_either_range") BU_P9_OVERLOAD_2 (CMPEQB, "byte_in_set") /* Builtins for scalar instructions added in ISA 3.1 (power10). */ -BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged) -BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm) -BU_P10_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm) -BU_P10_MISC_2 (PDEPD, "pdepd", CONST, pdepd) -BU_P10_MISC_2 (PEXTD, "pextd", CONST, pextd) +BU_P10_POWERPC64_MISC_2 (CFUGED, "cfuged", CONST, cfuged) +BU_P10_POWERPC64_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm) +BU_P10_POWERPC64_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm) +BU_P10_POWERPC64_MISC_2 (PDEPD, "pdepd", CONST, pdepd) +BU_P10_POWERPC64_MISC_2 (PEXTD, "pextd", CONST, pextd) /* Builtins for vector instructions added in ISA 3.1 (power10). */ BU_P10V_AV_2 (VCLRLB, "vclrlb", CONST, vclrlb) BU_P10V_AV_2 (VCLRRB, "vclrrb", CONST, vclrrb) BU_P10V_AV_2 (VCFUGED, "vcfuged", CONST, vcfuged)
Re: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations
On Mon, 2020-10-05 at 11:51 -0700, Carl Love wrote: > Will, Segher: > > Patch 1, adds the 128-bit sign extension instruction support and > corresponding builtin support. > > I updated the change log per the comments from Will. > > Patch has been retested on Power 9 LE. > > Pet me know if it is ready to commit to mainline. > > Carl > > --- > > > gcc/ChangeLog > > 2020-10-05 Carl Love > * config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define > for new builtins. > * config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL): Add > overloaded builtin definitions. > (VSIGNEXTSB2W, VSIGNEXTSH2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D): > Add builtin expansions. > * config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI, > P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions. > * config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di > visible. > * doc/extend.texi: Add documentation for the vec_signexti and > vec_signextll builtins. > > gcc/testsuite/ChangeLog > > 2020-10-05 Carl Love > * gcc.target/powerpc/p9-sign_extend-runnable.c: New test case. > --- > gcc/config/rs6000/altivec.h | 3 + > gcc/config/rs6000/rs6000-builtin.def | 9 ++ > gcc/config/rs6000/rs6000-call.c | 13 ++ > gcc/config/rs6000/vsx.md | 2 +- > gcc/doc/extend.texi | 15 ++ > .../powerpc/p9-sign_extend-runnable.c | 128 ++ > 6 files changed, 169 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c > > diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h > index f7720d136c9..cfa5eda4cd5 100644 > --- a/gcc/config/rs6000/altivec.h > +++ b/gcc/config/rs6000/altivec.h > @@ -494,6 +494,9 @@ > > #define vec_xlx __builtin_vec_vextulx > #define vec_xrx __builtin_vec_vexturx > +#define vec_signexti __builtin_vec_vsignexti > +#define vec_signextll __builtin_vec_vsignextll > + > #endif Can probably drop that blank line. > > /* Predicates. > diff --git a/gcc/config/rs6000/rs6000-builtin.def > b/gcc/config/rs6000/rs6000-builtin.def > index e91a48ddf5f..4c2e9460949 100644 > --- a/gcc/config/rs6000/rs6000-builtin.def > +++ b/gcc/config/rs6000/rs6000-builtin.def > @@ -2715,6 +2715,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD, "vprtybd") > BU_P9V_OVERLOAD_1 (VPRTYBQ, "vprtybq") > BU_P9V_OVERLOAD_1 (VPRTYBW, "vprtybw") > BU_P9V_OVERLOAD_1 (VPARITY_LSBB, "vparity_lsbb") > +BU_P9V_OVERLOAD_1 (VSIGNEXTI,"vsignexti") > +BU_P9V_OVERLOAD_1 (VSIGNEXTLL, "vsignextll") > > /* 2 argument functions added in ISA 3.0 (power9). */ > BU_P9_2 (CMPRB, "byte_in_range",CONST, cmprb) > @@ -2726,6 +2728,13 @@ BU_P9_OVERLOAD_2 (CMPRB, "byte_in_range") > BU_P9_OVERLOAD_2 (CMPRB2,"byte_in_either_range") > BU_P9_OVERLOAD_2 (CMPEQB,"byte_in_set") > > +/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1. > */ I have mixed feelings about straddling the ISA 3.0 and 3.1 ; but not sure how to properly improve. (I defer). The rest LGTM, Thanks -Will > +BU_P9V_AV_1 (VSIGNEXTSB2W, "vsignextsb2w", CONST, > vsx_sign_extend_qi_v4si) > +BU_P9V_AV_1 (VSIGNEXTSH2W, "vsignextsh2w", CONST, > vsx_sign_extend_hi_v4si) > +BU_P9V_AV_1 (VSIGNEXTSB2D, "vsignextsb2d", CONST, > vsx_sign_extend_qi_v2di) > +BU_P9V_AV_1 (VSIGNEXTSH2D, "vsignextsh2d", CONST, > vsx_sign_extend_hi_v2di) > +BU_P9V_AV_1 (VSIGNEXTSW2D, "vsignextsw2d", CONST, > vsx_sign_extend_si_v2di) > + > /* Builtins for scalar instructions added in ISA 3.1 (power10). */ > BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged) > BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm) > diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c > index a8b520834c7..9e514a01012 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -5527,6 +5527,19 @@ const struct altivec_builtin_types > altivec_overloaded_builtins[] = { > RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, > RS6000_BTI_INTSI, RS6000_BTI_INTSI }, > > + /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 > */ > + { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W, > +RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 }, > + { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W, > +RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 }, > + > + { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D, > +RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 }, > + { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D, > +RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 }, > + { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D, > +RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 }, > + >
Re: [PATCH 2a/5] rs6000, vec_rlnm builtin fix arguments
On Mon, 2020-10-05 at 11:52 -0700, Carl Love wrote: > Will, Segher: > > > > The following changes were made from the previous version: > > Per Will's comments, I split the bug fix from patch 2 into a separate > patch. This patch is the bug fix for the vec_rlnm builtin. I recommend trying to keep a clean paragraph appropriate for the commit log, separate from the history of the patch. i.e. "This patch fixes an error in how the vec_rlnm() builtin parameters were handled." > > Regression tests reran on Power 9 LE with no regression errors. > > Please let me know if it looks OK to commit to mainline. > > Carl > > -- > > > gcc/ChangeLog > > 2020-10-05 Carl Love > > * config/rs6000/altivec.h (vec_rlnm): Fix bug in argument generation. Is there any testcase impact? (If this doesn't fix an existing test, may be worth adding one..) LGTM, Thanks -Will > --- > gcc/config/rs6000/altivec.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h > index 8a2dcda0144..f7720d136c9 100644 > --- a/gcc/config/rs6000/altivec.h > +++ b/gcc/config/rs6000/altivec.h > @@ -183,7 +183,7 @@ > #define vec_recipdiv __builtin_vec_recipdiv > #define vec_rlmi __builtin_vec_rlmi > #define vec_vrlnm __builtin_vec_rlnm > -#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b))) > +#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c))) > #define vec_rsqrt __builtin_vec_rsqrt > #define vec_rsqrte __builtin_vec_rsqrte > #define vec_signed __builtin_vec_vsigned
Re: [PATCH 2b/5] RS6000 add 128-bit Integer Operations
On Mon, 2020-10-05 at 11:52 -0700, Carl Love wrote: > Will and Segher: > > This is the rest of the second patch which adds the 128-bit integer > support for divide, modulo, shift, compare of 128-bit > integers instructions and builtin support. > > In the last round of changes, the flag for the 128-bit operations was > removed. Per Will's comments, the BU_P10_128BIT_* builtin definitions > can be removed. Instead we can just use P10V_BUILTIN. Similarly for > the BU_P10_P builtin definition. The commit log was updated to reflect > the change. There were a few change log entries for the 128-bit > operations flag that needed removing. As well as other fixes noted by > Will. > > The changes are all name changes not functional changes. > > No regression failures were found when run on a P9. > > Please let me know if this is ready for mainline. > >Carl > > > > gcc/ChangeLog > > 2020-10/05 Carl Love > * config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define > for new builtins. > * config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD, > UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs. > (altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud, > altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq, > altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm, > altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq, > altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New > define_insn. > (vec_widen_umult_even_v2di, vec_widen_smult_even_v2di, > vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi, > altivec_vrlqnm): New define_expands. > * config/rs6000/rs6000-builtin.def (VCMPEQUT_P, VCMPGTST_P, > VCMPGTUT_P): Add macro expansions. > (VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI, > CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P, > VCMPAET_P, VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ, > VSLQ, VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI, > MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions. > (VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions. > * config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT, > P10V_BUILTIN_CMPGE_1TI, P10V_BUILTIN_CMPGE_U1TI, > P10V_BUILTIN_VCMPGTUT, P10V_BUILTIN_VCMPGTST, > P10V_BUILTIN_CMPLE_1TI, P10V_BUILTIN_VCMPLE_U1TI, > P10V_BUILTIN_128BIT_DIV_V1TI, P10V_BUILTIN_128BIT_UDIV_V1TI, > P10V_BUILTIN_128BIT_VMULESD, P10V_BUILTIN_128BIT_VMULEUD, > P10V_BUILTIN_128BIT_VMULOSD, P10V_BUILTIN_128BIT_VMULOUD, Just sniff-checked a few. Don't see the P10V_BUILTIN_128BIT_* entries below. > P10V_BUILTIN_VNOR_V1TI, P10V_BUILTIN_VNOR_V1TI_UNS, > P10V_BUILTIN_128BIT_VRLQ, P10V_BUILTIN_128BIT_VRLQMI, > P10V_BUILTIN_128BIT_VRLQNM, P10V_BUILTIN_128BIT_VSLQ, > P10V_BUILTIN_128BIT_VSRQ, P10V_BUILTIN_128BIT_VSRAQ, > P10V_BUILTIN_VCMPGTUT_P, P10V_BUILTIN_VCMPGTST_P, > P10V_BUILTIN_VCMPEQUT_P, P10V_BUILTIN_VCMPGTUT_P, > P10V_BUILTIN_VCMPGTST_P, P10V_BUILTIN_CMPNET, > P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P, > P10V_BUILTIN_128BIT_VSIGNEXTSD2Q, P10V_BUILTIN_128BIT_DIVES_V1TI, > P10V_BUILTIN_128BIT_MODS_V1TI, P10V_BUILTIN_128BIT_MODU_V1TI): > New overloaded definitions. > (rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT, > P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI, > P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT, > P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI, > P10_BUILTIN_CMPLE_U1TI]: New case statements. > (rs6000_init_builtins) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]: > New assignments. > (altivec_init_builtins): New E_V1TImode case statement. > (builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD, > P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI, > P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI, > P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements. I don't see these keywords below. Possibly P10V_* > * config/rs6000/r6000.c (rs6000_handle_altivec_attribute)[E_TImode, > E_V1TImode]: New case statements. > * config/rs6000/r6000.h (RS6000_BTM_TI_VECTOR_OPS): New defines. > (rs6000_builtin_type_index): New enum value RS6000_BTI_bool_V1TI. > * config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti, > vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti, > vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p, > vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3, > vlshrv1ti3, vashrv1ti3): New define_expands. > * config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ, > UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ, >
Re: [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support
On Mon, 2020-10-05 at 11:52 -0700, Carl Love wrote: > Will, Segher: > > Add support for converting to/from 128-bit integers and 128-bit > decimal floating point formats. > > The updates from the previous version of the patch: > > Just a fix for the change log per Will's comments. > > No regression failures were found when run on a P9. > > Please let me know if this is ready for mainline. > >Carl > > -- > > > gcc/ChangeLog > > 2020-10-05 Carl Love > * config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns. > * config/rs6000/rs6000-call.c (P10V_BUILTIN_VCMPNET_P, > P10V_BUILTIN_VCMPAET_P): > New overloaded definitions. > > gcc/testsuite/ChangeLog > > 2020-10-05 Carl Love > * gcc.target/powerpc/int_128bit-runnable.c: Update test. Maybe 'Add 128-bit DFP conversion tests' to give it better meaning. > --- > gcc/config/rs6000/dfp.md | 14 + > gcc/config/rs6000/rs6000-call.c | 4 ++ > .../gcc.target/powerpc/int_128bit-runnable.c | 62 +++ > 3 files changed, 80 insertions(+) > > diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md > index 8f822732bac..0e82e315fee 100644 > --- a/gcc/config/rs6000/dfp.md > +++ b/gcc/config/rs6000/dfp.md > @@ -222,6 +222,13 @@ >"dcffixq %0,%1" >[(set_attr "type" "dfp")]) > > +(define_insn "floattitd2" > + [(set (match_operand:TD 0 "gpc_reg_operand" "=d") > + (float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))] > + "TARGET_POWER10" > + "dcffixqq %0,%1" > + [(set_attr "type" "dfp")]) > + > ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer. > ;; This is the first stage of converting it to an integer type. > > @@ -241,6 +248,13 @@ >"TARGET_DFP" >"dctfix %0,%1" >[(set_attr "type" "dfp")]) > + > +(define_insn "fixtdti2" > + [(set (match_operand:TI 0 "gpc_reg_operand" "=v") > + (fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))] > + "TARGET_POWER10" > + "dctfixqq %0,%1" > + [(set_attr "type" "dfp")]) > > ;; Decimal builtin support ok > > diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c > index 87fff5c1c80..8d00a25d806 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -4967,6 +4967,8 @@ const struct altivec_builtin_types > altivec_overloaded_builtins[] = { > RS6000_BTI_bool_V2DI, 0 }, >{ P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P, > RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 }, > + { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P, > +RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 > }, > >{ P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P, > RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, > @@ -5074,6 +5076,8 @@ const struct altivec_builtin_types > altivec_overloaded_builtins[] = { > RS6000_BTI_bool_V2DI, 0 }, >{ P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P, > RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 }, > + { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P, > +RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 > }, >{ P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P, > RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, >{ P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P, ok > diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c > b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c > index 85ad544e22b..ec3dcf3dff1 100644 > --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c > +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c > @@ -38,6 +38,7 @@ > #if DEBUG > #include > #include > +#include > > > void print_i128(__int128_t val) > @@ -59,6 +60,13 @@ int main () >__int128_t arg1, result; >__uint128_t uarg2; > > + _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128; > + > + struct conv_t { > +__uint128_t u128; > +_Decimal128 d128; > + } conv, conv2; > + >vector signed long long int vec_arg1_di, vec_arg2_di; >vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di; >vector unsigned long long int vec_uresult_di; > @@ -2249,6 +2257,60 @@ int main () > abort(); > #endif >} > + > + /* DFP to __int128 and __int128 to DFP conversions */ > + /* Can't get printing of DFP values to work. Print the DFP value as an > + unsigned int so we can see the bit patterns. */ Drop 'Can't get ...', just 'Print the DFP...' should be sufficient. > + conv.u128 = 0x2208ULL; > + conv.u128 = (conv.u128 << 64) | 0x4ULL; //DFP bit pattern for integer 4 > + expected_result_dfp128 = conv.d128; > > + arg1 = 4; > + > + conv.d128 = (_Decimal128) arg1; > + > + result_dfp128 = (_Decimal128) arg1; > + if (((conv.u128 >>64) != 0x2208ULL) && > + ((conv.u128 & 0xFFF
Re: [PATCH 4/5] Test 128-bit shifts for just the int128 type.
On Mon, 2020-10-05 at 11:52 -0700, Carl Love wrote: > Will, Segher: > > Patch 4 adds the vector 128-bit integer shift instruction support for > the V1TI type. > > The changes from the previous version include: > > Fixed up the change log entry issues noted by Will. > > Regression tests reran on Power 9 LE with no regression errors. > > Please let me know if it looks OK to commit to mainline. > > Carl > - > > gcc/ChangeLog > > 2020-10-05 Carl Love > * config/rs6000/altivec.md (altivec_vslq, altivec_vsrq): > Rename to altivec_vslq_, altivec_vsrq_, mode VEC_TI. > * config/rs6000/vector.md (VEC_TI): New name for VSX_TI iterator. What was the old name? (Maybe just 'New iterator' ?) Ok, back from below. this is new name and location for what was previously named VSX_TI in vsx.md. Wouldn't hurt to have a statement in the description to clarify that. "This patch renames the VSX_TI iterator to VEC_TI, and updates the users." > (vashlv1ti3): Change to vashl3, mode VEC_TI. > (vlshrv1ti3): Change to vlshr3, mode VEC_TI. > * config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator. > > gcc/testsuite/ChangeLog > > 2020-10-05 Carl Love > gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left > tests. > --- > gcc/config/rs6000/altivec.md | 16 - > gcc/config/rs6000/vector.md | 27 --- > gcc/config/rs6000/vsx.md | 33 +-- > .../gcc.target/powerpc/int_128bit-runnable.c | 16 +++-- > 4 files changed, 52 insertions(+), 40 deletions(-) > ok > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 34a4731342a..5db3de3cc9f 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -2219,10 +2219,10 @@ >"vsl %0,%1,%2" >[(set_attr "type" "vecsimple")]) > > -(define_insn "altivec_vslq" > - [(set (match_operand:V1TI 0 "vsx_register_operand" "=v") > - (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v") > - (match_operand:V1TI 2 "vsx_register_operand" "v")))] > +(define_insn "altivec_vslq_" > + [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v") > + (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v") > + (match_operand:VEC_TI 2 "vsx_register_operand" "v")))] >"TARGET_POWER10" >/* Shift amount in needs to be in bits[57:63] of 128-bit operand. */ >"vslq %0,%1,%2" > @@ -2236,10 +2236,10 @@ >"vsr %0,%1,%2" >[(set_attr "type" "vecsimple")]) > > -(define_insn "altivec_vsrq" > - [(set (match_operand:V1TI 0 "vsx_register_operand" "=v") > - (lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v") > -(match_operand:V1TI 2 "vsx_register_operand" "v")))] > +(define_insn "altivec_vsrq_" > + [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v") > + (lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v") > +(match_operand:VEC_TI 2 "vsx_register_operand" > "v")))] >"TARGET_POWER10" >/* Shift amount in needs to be in bits[57:63] of 128-bit operand. */ >"vsrq %0,%1,%2" ok > diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md > index 0cca4232619..3ea3a91845a 100644 > --- a/gcc/config/rs6000/vector.md > +++ b/gcc/config/rs6000/vector.md > @@ -26,6 +26,9 @@ > ;; Vector int modes > (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI]) > > +;; 128-bit int modes > +(define_mode_iterator VEC_TI [V1TI TI]) > + > ;; Vector int modes for parity > (define_mode_iterator VEC_IP [V8HI > V4SI > @@ -1627,17 +1630,17 @@ >"") > > ;; No immediate version of this 128-bit instruction > -(define_expand "vashlv1ti3" > - [(set (match_operand:V1TI 0 "vsx_register_operand" "=v") > - (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v") > - (match_operand:V1TI 2 "vsx_register_operand" "v")))] > +(define_expand "vashl3" > + [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v") > + (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand") > + (match_operand:VEC_TI 2 "vsx_register_operand")))] >"TARGET_POWER10" > { >/* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */ > - rtx tmp = gen_reg_rtx (V1TImode); > + rtx tmp = gen_reg_rtx (mode); > >emit_insn(gen_xxswapd_v1ti (tmp, operands[2])); > - emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp)); > + emit_insn(gen_altivec_vslq_ (operands[0], operands[1], tmp)); >DONE; > }) > > @@ -1650,17 +1653,17 @@ >"") > > ;; No immediate version of this 128-bit instruction > -(define_expand "vlshrv1ti3" > - [(set (match_operand:V1TI 0 "vsx_register_operand" "=v") > - (lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v") > -(match_ope
Re: [PATCH 5/5] Conversions between 128-bit integer and floating point values.
On Mon, 2020-10-05 at 11:52 -0700, Carl Love wrote: > Will, Segher: > > This patch adds support for converting to/from 128-bit integers and > 128-bit decimal floating point formats using the new P10 instructions > dcffixqq and dctfixqq. The new instructions are only used on P10 HW, > otherwise the conversions continue to use the existing SW routines. > > The changes from the previous version include: > > Fixed up the change log entry issues noted by Will. > > Regression tests reran on Power 9 LE with no regression errors. > > Please let me know if it looks OK to commit to mainline. > > Carl > - > > gcc/ChangeLog > > 2020-10-05 Carl Love > * config/rs6000/rs6000.md (floatti2, floatunsti2, > fix_truncti2, fixuns_truncti2): Add > define_insn for mode IEEE 128. > * libgcc/config/rs6000/fixkfti.c: Renamed to fixkfti-sw.c. > Update source function name. White space fixes. I'd move the generic 'Update source ... White space...' bits to the patch description. In addition to the 'Rename' statement, some form of 'Change calls of __fixkfti to __fixkfti_sw` would be more useful here. > * libgcc/config/rs6000/fixunskfti.c: Renamed to fixunskfti-sw.c. > Update source function name. White space fixes. > * libgcc/config/rs6000/float128-hw.c (__floattikf_hw, > __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw): > New functions. ok > * libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1): > New macro. > (__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve, > __fixunskfti_resolve): Add resolve functions. > (__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New > functions. ok > * libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf, > __fixtfti, __fixunstfti): Add editor commands to change > names. > * libgcc/config/rs6000/float128-sed-hw (__floattitf, > __floatuntitf, __fixtfti, __fixunstfti): Add editor commands > to change names. ok > * libgcc/config/rs6000/floattikf.c: Renamed to floattikf-sw.c. > * libgcc/config/rs6000/floatuntikf.c: Renamed to floatuntikf-sw.c. > * libgcc/config/rs6000/quaad-float128.h (__floattikf_sw, > __floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw, > __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf, > __floatuntikf, __fixkfti, __fixunskfti): New extern declarations. > * libgcc/config/rs6000/t-float128 (floattikf, floatuntikf, > fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs. > (floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add > file names to fp128_ppc_funcs. > > gcc/testsuite/ChangeLog > > 2020-10-05 Carl Love > * gcc.target/powerpc/fl128_conversions.c: New file. fp128_conversions.c > --- > gcc/config/rs6000/rs6000.md | 36 +++ > .../gcc.target/powerpc/fp128_conversions.c| 286 ++ > .../config/rs6000/{fixkfti.c => fixkfti-sw.c} | 4 +- > .../rs6000/{fixunskfti.c => fixunskfti-sw.c} | 7 +- > libgcc/config/rs6000/float128-hw.c| 24 ++ > libgcc/config/rs6000/float128-ifunc.c | 44 ++- > libgcc/config/rs6000/float128-sed | 4 + > libgcc/config/rs6000/float128-sed-hw | 4 + > .../rs6000/{floattikf.c => floattikf-sw.c}| 4 +- > .../{floatuntikf.c => floatuntikf-sw.c} | 4 +- > libgcc/config/rs6000/quad-float128.h | 17 +- > libgcc/config/rs6000/t-float128 | 3 +- > 12 files changed, 417 insertions(+), 20 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c > rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%) > rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (90%) > rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%) > rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%) > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 694ff70635e..5db5d0b4505 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -6390,6 +6390,42 @@ > xscvsxddp %x0,%x1" >[(set_attr "type" "fp")]) > > +(define_insn "floatti2" > + [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v") > + (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > +{ > + return "xscvsqqp %0,%1"; > +} > + [(set_attr "type" "fp")]) > + > +(define_insn "floatunsti2" > + [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v") > + (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" > "v")))] > + "TARGET_POWER10" > +{ > + return "xscvuqqp %0,%1"; > +} > + [(set_attr "type" "fp")]) > + > +(define_insn "fix_truncti2" > + [(set (match_operand:TI 0 "vsx_register_operand" "=v") > + (fix:TI (match_operand:IEEE128 1 "vsx_register_ope
Re: [PATCH 7/8] [RS6000] rs6000_rtx_costs reduce cost for SETs
On Thu, 2020-10-08 at 09:27 +1030, Alan Modra via Gcc-patches wrote: > The aim of this patch is to make rtx_costs for SETs closer to > insn_cost for SETs. One visible effect on powerpc code is increased > if-conversion. > > * config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost of SET > operands. > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 76aedbfae6f..d455aa52427 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -21684,6 +21684,35 @@ rs6000_rtx_costs (rtx x, machine_mode mode, > int outer_code, > } >return false; > > +case SET: > + /* On entry the value in *TOTAL is the number of general > purpose > + regs being set, multiplied by COSTS_N_INSNS (1). Handle > + costing of set operands specially since in most cases we have > + an instruction rather than just a piece of RTL and should > + return a cost comparable to insn_cost. That's a little > + complicated because in some cases the cost of SET operands is > + non-zero, see point 5 above and cost of PLUS for example, and > + in others it is zero, for example for (set (reg) (reg)). > + But (set (reg) (reg)) has the same insn_cost as > + (set (reg) (plus (reg) (reg))). Hack around this by > + subtracting COSTS_N_INSNS (1) from the operand cost in cases > + were we add at least COSTS_N_INSNS (1) for some operation. s/were/where/ :-) > + However, don't do so for constants. Constants might cost > + more than zero when they require more than one instruction, > + and we do want the cost of extra instructions. */ > + { > + rtx_code src_code = GET_CODE (SET_SRC (x)); > + if (src_code == CONST_INT > + || src_code == CONST_DOUBLE > + || src_code == CONST_WIDE_INT) > + return false; > + int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed) > + + rtx_cost (SET_DEST (x), mode, SET, 0, > speed)); > + if (set_cost >= COSTS_N_INSNS (1)) > + *total += set_cost - COSTS_N_INSNS (1); > + return true; > + } > + > default: >return false; > } lgtm, thanks -Will
Re: [RS6000] rotate and mask constants
On Thu, 2020-10-08 at 09:36 +1030, Alan Modra via Gcc-patches wrote: > Implement more two insn constants. rotate_and_mask_constant covers > 64-bit constants that can be formed by rotating a 16-bit signed > constant, rotating a 16-bit signed constant masked on left or right > (rldicl and rldicr), rotating a 16-bit signed constant masked by > rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns. All the > values possible for DImode rs6000_is_valid_and_mask are covered. lgtm, Just a couple cosmetic nits, since I was reading through.. :-) > > Bootstrapped and regression tested powerpc64le-linux. > > PR 94393 PR Target/94393 (unless the hooks currently handle that for us? ) > gcc/ gcc/ChangeLog: > * config/rs6000/rs6000.c (rotate_and_mask_constant): New function. > (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here. > * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete. > gcc/testsuite/ > * gcc.target/powerpc/rot_cst.h, > * gcc.target/powerpc/rot_cst1.c, > * gcc.target/powerpc/rot_cst2.c: New tests. > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 14ecbad5df4..9809d11f47a 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -1129,6 +1129,8 @@ static tree rs6000_handle_altivec_attribute (tree *, > tree, tree, int, bool *); > static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *); > static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree); > static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT); > +static bool rotate_and_mask_constant (unsigned HOST_WIDE_INT, HOST_WIDE_INT > *, > + int *, unsigned HOST_WIDE_INT *); > static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool); > static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, > bool); > static int rs6000_debug_address_cost (rtx, machine_mode, addr_space_t, > @@ -5877,7 +5879,7 @@ num_insns_constant_gpr (HOST_WIDE_INT value) > } > > /* Helper for num_insns_constant. Allow constants formed by the > - num_insns_constant_gpr sequences, plus li -1, rldicl/rldicr/rlwinm, > + num_insns_constant_gpr sequences, and li/lis+rldicl/rldicr/rldic/rlwinm, > and handle modes that require multiple gprs. */ > > static int > @@ -5892,8 +5894,8 @@ num_insns_constant_multi (HOST_WIDE_INT value, > machine_mode mode) >if (insns > 2 > /* We won't get more than 2 from num_insns_constant_gpr >except when TARGET_POWERPC64 and mode is DImode or > - wider, so the register mode must be DImode. */ > - && rs6000_is_valid_and_mask (GEN_INT (low), DImode)) > + wider. */ > + && rotate_and_mask_constant (low, NULL, NULL, NULL)) > insns = 2; >total += insns; >/* If BITS_PER_WORD is the number of bits in HOST_WIDE_INT, doing > @@ -9524,6 +9526,244 @@ rs6000_emit_set_const (rtx dest, rtx source) >return true; > } > > +/* Rotate DImode word, being careful to handle the case where > + HOST_WIDE_INT is larger than DImode. */ > + > +static inline unsigned HOST_WIDE_INT > +rotate_di (unsigned HOST_WIDE_INT x, unsigned int shift) > +{ > + unsigned HOST_WIDE_INT mask_hi, mask_lo; > + > + mask_hi = (HOST_WIDE_INT_1U << 63 << 1) - (HOST_WIDE_INT_1U << shift); > + mask_lo = (HOST_WIDE_INT_1U << shift) - 1; > + x = ((x << shift) & mask_hi) | ((x >> (64 - shift)) & mask_lo); > + x = (x ^ (HOST_WIDE_INT_1U << 63)) - (HOST_WIDE_INT_1U << 63); > + return x; > +} > + > +/* Can C be formed by rotating a 16-bit positive value left by C16LSB? */ > + > +static inline bool > +is_rotate_positive_constant (unsigned HOST_WIDE_INT c, int c16lsb, > + HOST_WIDE_INT *val, int *shift, > + unsigned HOST_WIDE_INT *mask) > +{ > + if ((c & ~(HOST_WIDE_INT_UC (0x7fff) << c16lsb)) == 0) > +{ > + /* eg. c = 1100 ... > + -> val = 0x3000, shift = 49, mask = -1ull. */ > + if (val) > + { > + c >>= c16lsb; > + /* Make the value and shift canonical in the sense of > + selecting the smallest value. For the example above > + -> val = 3, shift = 61. */ > + int trail_zeros = ctz_hwi (c); > + c >>= trail_zeros; > + c16lsb += trail_zeros; > + *val = c; > + *shift = c16lsb; > + *mask = HOST_WIDE_INT_M1U; > + } > + return true; > +} > + return false; > +} > + > +/* Can C be formed by rotating a 16-bit negative value left by C16LSB? */ > + > +static inline bool > +is_rotate_negative_constant (unsigned HOST_WIDE_INT c, int c16lsb, > + HOST_WIDE_INT *val, int *shift, > + unsigned HOST_WIDE_INT *mask) > +{ > + if ((c | (HOST_WIDE_INT_UC (0x7fff) << c16lsb)) == HOST_WIDE_INT_M1U) > +{ > + if (val) > + { > + c >>= c16lsb; > +
[PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. (1/2)
[PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. Hi, This adds an assortment of tests to exercise the -mno-vsx option and confirm the impacts on the ARCH_PWR8 define. These are based on and inspired by PR 101865, which reports that _ARCH_PWR8 is disabled when -mno-vsx is passed on the commandline. There are a small number of failures introduced by these tests, those are resolved with the changes in part 2. OK for trunk? Thanks, -Will gcc/testsuite: * gcc.target/powerpc/predefine_p7-novsx.c: New test. * gcc.target/powerpc/predefine_p8-noaltivec-novsx.c: New test. * gcc.target/powerpc/predefine_p8-novsx.c: New test. * gcc.target/powerpc/predefine_p9-novsx.c: New test. * gcc.target/powerpc/predefine_pragma_vsx.c: New test. diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c new file mode 100644 index ..e842025b4d3c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c @@ -0,0 +1,9 @@ +/* { dg-do preprocess } */ +/* Test whether the ARCH_PWR7 and ARCH_PWR8 defines gets set + * when we specify power7, plus options. +/* This is a variation of the test at issue in GCC PR 101865 */ +/* { dg-options "-dM -E -mdejagnu-cpu=power7 -mno-vsx" } */ +/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define _ARCH_PWR7 1($|\\n)" } } */ +/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define _ARCH_PWR8 1($|\\n)" } } */ +/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define __VSX__ 1($|\\n)" } } */ +/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define __ALTIVEC__ 1($|\\n)" } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c new file mode 100644 index ..c3b705ca3d48 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c @@ -0,0 +1,7 @@ +/* { dg-do preprocess } */ +/* Test whether the ARCH_PWR8 define remains set after disabling both altivec and vsx. */ +/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-altivec -mno-vsx" } */ +/* { dg-final { scan-file predefine_p8-noaltivec-novsx.i "(^|\\n)#define _ARCH_PWR8 1($|\\n)" } } */ +/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define _ARCH_PWR9 1($|\\n)" } } */ +/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define __VSX__ 1($|\\n)" } } */ +/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define __ALTIVEC__ 1($|\\n)" } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c new file mode 100644 index ..8b6c69b20104 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c @@ -0,0 +1,8 @@ +/* { dg-do preprocess } */ +/* Test whether the ARCH_PWR8 define remains set after disabling vsx. + This also confirms __ALTIVEC__ remains set when VSX is disabled. */ +/* This is the primary test at issue in GCC PR 101865 */ +/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-vsx" } */ +/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define _ARCH_PWR8 1($|\\n)" } } */ +/* { dg-final { scan-file-not predefine_p8-novsx.i "(^|\\n)#define __VSX__ 1($|\\n)" } } */ +/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define __ALTIVEC__ 1($|\\n)" } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c new file mode 100644 index ..eef42c111663 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c @@ -0,0 +1,10 @@ +/* { dg-do preprocess } */ +/* Test whether the ARCH_PWR8 define remains set after disabling vsx. + This also confirms __ALTIVEC__ remains set when VSX is disabled. */ +/* This is the primary test at issue in GCC PR 101865 */ +/* { dg-options "-dM -E -mdejagnu-cpu=power9 -mno-vsx" } */ +/* {xfail *-*-*} */ +/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR8 1($|\\n)" } } */ +/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR9 1($|\\n)" } } */ +/* { dg-final { scan-file-not predefine_p9-novsx.i "(^|\\n)#define __VSX__ 1($|\\n)" } } */ +/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define __ALTIVEC__ 1($|\\n)" } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c b/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c new file mode 100644 index ..b300600af999 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c @@ -0,0 +1,83 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2" } */ + +/* Ensure that if we set a pragma gcc target for an + older processor, we do not compile builtins that + the older target does not supp
[PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)
[PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] Hi, The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE, and can be disabled by dependent options when it should not be. This manifests in the issue seen in PR101865 where -mno-vsx mistakenly disables _ARCH_PWR8. This change replaces the relevant TARGET_DIRECT_MOVE references with a TARGET_POWER8 entry so that the direct_move and power8 features can be enabled or disabled independently. This is done via the OPTION_MASK definitions, so this means that some references to the OPTION_MASK_DIRECT_MOVE option are now replaced with OPTION_MASK_POWER8. The existing (and rather lengthy) commentary for DIRECT_MOVE remains in place in rs6000-c.cc:rs6000_target_modify_macros(). The if-defined logic there will now set a __DIRECT_MOVE__ define when TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug purposes, but is otherwise unused. This can be removed in a subsequent patch, or in an update of this patch, depending on feedback. This regests cleanly (power8,power9,power10), and resolves PR 101865 as represented in the tests from (1/2). OK for trunk? Thanks, -Will gcc/ PR Target/101865 * config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE usage with TARGET_POWER8. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Add __DIRECT_MOVE__ define. Replace _ARCH_PWR8_ define conditional with OPTION_MASK_POWER8. * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Add OPTION_MASK_POWER8 entry. (POWERPC_MASKS): Same. * config/rs6000/rs6000.cc (rs6000_option_override_internal): Replace OPTION_MASK_DIRECT_MOVE usage with OPTION_MASK_POWER8. (rs6000_opt_masks): Add "power8" entry for new OPTION_MASK_POWER8. * config/rs6000/rs6000.opt (-mpower8): Add entry for POWER8. * config/rs6000/vsx.md (vsx_extract_): Replace TARGET_DIRECT_MOVE usage with TARGET_POWER8. (define_peephole2): Same. diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index 3ce729c1e6de..91a0f39bd796 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -163,11 +163,11 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins fncode) case ENB_P7: return TARGET_POPCNTD; case ENB_P7_64: return TARGET_POPCNTD && TARGET_POWERPC64; case ENB_P8: - return TARGET_DIRECT_MOVE; + return TARGET_POWER8; case ENB_P8V: return TARGET_P8_VECTOR; case ENB_P9: return TARGET_MODULO; case ENB_P9_64: diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index ca9cc42028f7..41d51b039061 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -439,11 +439,13 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags) turned off in any of the following conditions: 1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly disabled and OPTION_MASK_DIRECT_MOVE was not explicitly enabled. 2. TARGET_VSX is off. */ - if ((flags & OPTION_MASK_DIRECT_MOVE) != 0) + if ((OPTION_MASK_DIRECT_MOVE) != 0) +rs6000_define_or_undefine_macro (define_p, "__DIRECT_MOVE__"); + if ((flags & OPTION_MASK_POWER8) != 0) rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8"); if ((flags & OPTION_MASK_MODULO) != 0) rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9"); if ((flags & OPTION_MASK_POWER10) != 0) rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10"); diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index c3825bcccd84..c873f6d58989 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -48,10 +48,11 @@ system. */ #define ISA_2_7_MASKS_SERVER (ISA_2_6_MASKS_SERVER \ | OPTION_MASK_P8_VECTOR\ | OPTION_MASK_CRYPTO \ | OPTION_MASK_DIRECT_MOVE \ +| OPTION_MASK_POWER8 \ | OPTION_MASK_EFFICIENT_UNALIGNED_VSX \ | OPTION_MASK_QUAD_MEMORY \ | OPTION_MASK_QUAD_MEMORY_ATOMIC) /* ISA masks setting fusion options. */ @@ -124,10 +125,11 @@ #define POWERPC_MASKS (OPTION_MASK_ALTIVEC\ | OPTION_MASK_CMPB \ | OPTION_MASK_CRYPTO \ | OPTION_MASK_DFP \ | OPTION_MASK_DIRECT_MOVE \ +| OPTION_MASK_POWER8
[PATCH, rs6000] Eliminate TARGET_CTZ,TARGET_FCTIDZ,FCTIWUZ defines
[PATCH, rs6000] Eliminate TARGET_CTZ,TARGET_FCTIDZ,FCTIWUZ defines Hi, This is the first of a batch of changes that eliminate a number of define TARGET_foo entries we have collected over time. TARGET_CTZ is defined as TARGET_MODULO, and has a low number of uses. References to TARGET_CTZ should be safe to replace with TARGET_MODULO throughout. TARGET_FCTIDZ is entirely unused, and safe to remove. TARGET_FCTIWUZ has a low number of uses, and can be directly replaced with TARGET_POPCNTD. This eliminates three defines. There should be no codegen changes, and this has regtested OK. OK for trunk? Thanks, gcc/ * config/rs6000/rs6000.h (TARGET_CTZ): Replace with TARGET_MODULO. (TARGET_FCTIDZ): Remove. (TARGET_FCTIWUZ): Replace with TARGET_POPCNTD. * config/rs6000/rs6000.cc (TARGET_CTZ): Replace with TARGET_MODULO. * config/rs6000/rs6000.md (ctz2): Replace TARGET_CTZ with TARGET_MODULO. (ctz2_hw): Same. (fixuns_truncsi2): Replace TARGET_FCTIWUZ with TARGET_POPCNTD. (fixuns_truncsi2_stfiwx): Same. (fctiwz_): Same. diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index fcca062a8709..eea427b1ca51 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -21998,11 +21998,11 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, if (!TARGET_MODULO && (code == MOD || code == UMOD)) *total += COSTS_N_INSNS (2); return false; case CTZ: - *total = COSTS_N_INSNS (TARGET_CTZ ? 1 : 4); + *total = COSTS_N_INSNS (TARGET_MODULO ? 1 : 4); return false; case FFS: *total = COSTS_N_INSNS (4); return false; diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index eb7b21584970..ee887efd1122 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -456,20 +456,17 @@ extern int rs6000_vector_align[]; || TARGET_PPC_GPOPT/* 970/power4 */\ || TARGET_POPCNTB /* ISA 2.02 */ \ || TARGET_CMPB /* ISA 2.05 */ \ || TARGET_POPCNTD) /* ISA 2.06 */ -#define TARGET_FCTIDZ TARGET_FCFID #define TARGET_STFIWX TARGET_PPC_GFXOPT #define TARGET_LFIWAX TARGET_CMPB #define TARGET_LFIWZX TARGET_POPCNTD #define TARGET_FCFIDS TARGET_POPCNTD #define TARGET_FCFIDU TARGET_POPCNTD #define TARGET_FCFIDUS TARGET_POPCNTD #define TARGET_FCTIDUZ TARGET_POPCNTD -#define TARGET_FCTIWUZ TARGET_POPCNTD -#define TARGET_CTZ TARGET_MODULO #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64) #define TARGET_MADDLD TARGET_MODULO #define TARGET_XSCVDPSPN (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR) #define TARGET_XSCVSPDPN (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR) @@ -1751,11 +1748,11 @@ typedef struct rs6000_args /* The CTZ patterns that are implemented in terms of CLZ return -1 for input of zero. The hardware instructions added in Power9 and the sequences using popcount return 32 or 64. */ #define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ - (TARGET_CTZ || TARGET_POPCNTD \ + (TARGET_MODULO || TARGET_POPCNTD \ ? ((VALUE) = GET_MODE_BITSIZE (MODE), 2)\ : ((VALUE) = -1, 2)) /* Specify the machine mode that pointers have. After generation of rtl, the compiler makes no further distinction diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index ad5a4cf2ef83..619a87374734 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -2414,11 +2414,11 @@ (define_insn "clz2" (define_expand "ctz2" [(set (match_operand:GPR 0 "gpc_reg_operand") (ctz:GPR (match_operand:GPR 1 "gpc_reg_operand")))] "" { - if (TARGET_CTZ) + if (TARGET_MODULO) { emit_insn (gen_ctz2_hw (operands[0], operands[1])); DONE; } @@ -2445,11 +2445,11 @@ (define_expand "ctz2" }) (define_insn "ctz2_hw" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))] - "TARGET_CTZ" + "TARGET_MODULO" "cnttz %0,%1" [(set_attr "type" "cntlz")]) (define_expand "ffs2" [(set (match_operand:GPR 0 "gpc_reg_operand") @@ -6326,11 +6326,11 @@ (define_insn_and_split "*fix_trunc2_mem" }) (define_expand "fixuns_truncsi2" [(set (match_operand:SI 0 "gpc_reg_operand") (unsigned_fix:SI (match_operand:SFDF 1 "gpc_reg_operand")))] - "TARGET_HARD_FLOAT && TARGET_FCTIWUZ && TARGET_STFIWX" + "TARGET_HARD_FLOAT && TARGET_POPCNTD && TARGET_STFIWX" { if (!TARGET_P8_VECTOR) { emit_insn (gen_fixuns_truncsi2_stfiwx (operands[0], operands[1])); DONE; @@ -6339,11 +6339,11 @@ (define_expand "
Re: [PATCH, rs6000] Eliminate TARGET_CTZ,TARGET_FCTIDZ,FCTIWUZ defines
On Tue, 2022-09-20 at 16:14 -0500, Segher Boessenkool wrote: > Hi! > > On Mon, Sep 19, 2022 at 06:19:15PM -0500, will schmidt wrote: > > This is the first of a batch of changes that eliminate a number > > of define TARGET_foo entries we have collected over time. > > Good good :-) > > > TARGET_CTZ is defined as TARGET_MODULO, and has a low number > > of uses. References to TARGET_CTZ should be safe to replace > > with TARGET_MODULO throughout. > > No, please don't. This has nothing to with "modulo". If you want to > say this is just whether we have ISA 3.0 or p9, make a new target > macro > for *that* and use that everywhere. > > This is a general issue, that will make the code much more sane if > you > can fix it! > > > TARGET_FCTIDZ is entirely unused, and safe to remove. > > Please make separate patches for separate issues. This makes it much > easier to review, and MUCH easier for all other ways we need to > handle > it (backports, reverts, everything else). With Git it is *easier* to > keep separate patches separate than it is to lump it all > together. So, > the trick is to keep things in separate commits during development > already (and you will find more benefits doing that, too!) Yup, I actually developed these three (plus a bunch more) separately, but combined the first three for posting. I'll split them back out and repost after a bit. > > TARGET_FCTIDZ was never used, it always used TARGET_FCFID directly. > > The original PEM mistakenly said this insn is "64-bit only". This > was > fixed in ISA 2.01 . > > > TARGET_FCTIWUZ has a low number of uses, and can be directly > > replaced with TARGET_POPCNTD. > > It is a p7 (ISA 2.06) insn. Please make a TARGET_P7 or such? Yes. I do have a change later in the (unposted) series to replace POPCNTD with POWER7, at a glance thats #17 down the line. In review I agree with your comment that the in-between changes aren't the best choices. I'll see about skipping the in-between values and going straight for POPCNTD->POWER7. I am looking at the TARGET_POWER10 notation as the target style, versus TARGET_P7, but I can go that direction if we think that would be preferred. Maybe it is since this is a retro-fix versus new. :-) > > In the current situation target macros like TARGET_POPCNTD are abused > to > mean either "can we use the popcntd insn", or to mean "can we use > insn > new on p7". Or sometimes something in between, or something in this > general neighbourhood. It is never clear which is meant, which makes > it > very hard to untangle this. But thanks for trying! :-) > > (Don't let me dicsourage you btw, most is pretty straightforward). Absolutely.. I do have this mostly covered locally, I just need to refine a few parts. :-) > > > > * config/rs6000/rs6000.h (TARGET_CTZ): Replace with > > TARGET_MODULO. > > Changelogs are indented with tabs, and this fits on one line. > > So, please make TARGET_P7 and such, and OPTION_MASKs for those in > rs6000-cpus.def? willdo, thanks -Will > > > Segher
Re: [PATCH] fixincludes: Deal also with the _Float128x cases [PR107059]
On Fri, 2022-09-30 at 09:20 +0200, Jakub Jelinek via Gcc-patches wrote: > On Wed, Sep 28, 2022 at 08:19:43PM +0200, Jakub Jelinek via Gcc- > patches wrote: > > Another case are the following 3 snippets: > > # if !__GNUC_PREREQ (7, 0) || defined __cplusplus > > # error "_Float128X supported but no constant suffix" > > # else > > # define __f128x(x) x##f128x > > # endif > > ... > > # if !__GNUC_PREREQ (7, 0) || defined __cplusplus > > # error "_Float128X supported but no complex type" > > # else > > # define __CFLOAT128X _Complex _Float128x > > # endif > > ... > > # if !__GNUC_PREREQ (7, 0) || defined __cplusplus > > # error "_Float128x supported but no type" > > # endif > > but as no target has _Float128x right now and don't see it > > coming soon, it isn't a big deal (on the glibc side it is of > > course ok to adjust those). > > This incremental patch deals handles the above 3 cases, so we > fixinclude what glibc itself changed too. > > Bootstrapped/regtested on x86_64-linux and i686-linux (together with > the > previously posted fixincludes/ change too), ok for trunk? Hi, The combination of these two patches allows me to build gcc successfully. (PPC64LE with RHEL9). A nit that Part1 needed massaging of the path/to/files (i.e. gcc/inclhack.def versus fixincludes/inclhack.def) to apply. I can't otherwise speak to the changes, aside from they seem to work for me. Thanks -WIll > > 2022-09-30 Jakub Jelinek > > PR bootstrap/107059 > * inclhack.def (glibc_cxx_floatn_5): New. > * fixincl.x: Regenerated. > * tests/base/bits/floatn.h: Regenerated. > > --- fixincludes/inclhack.def.jj 2022-09-29 22:18:47.974402688 > +0200 > +++ fixincludes/inclhack.def 2022-09-29 22:22:48.151145670 +0200 > @@ -2131,6 +2131,23 @@ fix = { > EOT; > }; > > +fix = { > +hackname = glibc_cxx_floatn_5; > +files = bits/floatn.h, bits/floatn-common.h, > "*/bits/floatn.h", "*/bits/floatn-common.h"; > +select= "^([ \t]*#[ \t]*if !__GNUC_PREREQ \\(7, 0\\) \\|\\| > )defined __cplusplus\n" > + "([ \t]*#[ \t]+error \"_Float128[xX] supported but no > )"; > +c_fix = format; > +c_fix_arg = "%1(defined __cplusplus && !__GNUC_PREREQ (13, > 0))\n%2"; > +test_text = <<-EOT > + # if !__GNUC_PREREQ (7, 0) || defined __cplusplus > + # error "_Float128X supported but no constant suffix" > + # endif > + # if !__GNUC_PREREQ (7, 0) || defined __cplusplus > + # error "_Float128x supported but no type" > + # endif > + EOT; > +}; > + > /* glibc-2.3.5 defines pthread mutex initializers incorrectly, > * so we replace them with versions that correspond to the > * definition. > --- fixincludes/fixincl.x.jj 2022-09-29 22:18:47.975402675 +0200 > +++ fixincludes/fixincl.x 2022-09-29 22:22:55.675909244 +0200 > @@ -2,11 +2,11 @@ > * > * DO NOT EDIT THIS FILE (fixincl.x) > * > - * It has been AutoGen-ed September 28, 2022 at 07:56:15 PM by > AutoGen 5.18.16 > + * It has been AutoGen-ed September 29, 2022 at 10:22:55 PM by > AutoGen 5.18.16 > * From the definitionsinclhack.def > * and the template file fixincl > */ > -/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Sep 28 19:56:15 CEST 2022 > +/* DO NOT SVN-MERGE THIS FILE, EITHER Thu Sep 29 22:22:55 CEST 2022 > * > * You must regenerate it. Use the ./genfixes script. > * > @@ -15,7 +15,7 @@ > * certain ANSI-incompatible system header files which are fixed to > work > * correctly with ANSI C and placed in a directory that GNU C will > search. > * > - * This file contains 271 fixup descriptions. > + * This file contains 272 fixup descriptions. > * > * See README for more information. > * > @@ -4273,6 +4273,43 @@ static const char* apzGlibc_Cxx_Floatn_4 > > /* * * * * * * * * * * * * * * * * * * * * * * * * * > * > + * Description of Glibc_Cxx_Floatn_5 fix > + */ > +tSCC zGlibc_Cxx_Floatn_5Name[] = > + "glibc_cxx_floatn_5"; > + > +/* > + * File name selection pattern > + */ > +tSCC zGlibc_Cxx_Floatn_5List[] = > + "bits/floatn.h\0bits/floatn- > common.h\0*/bits/floatn.h\0*/bits/floatn-common.h\0"; > +/* > + * Machine/OS name selection pattern > + */ > +#define apzGlibc_Cxx_Floatn_5Machs (const char**)NULL > + > +/* > + * content selection pattern - do fix if pattern found > + */ > +tSCC zGlibc_Cxx_Floatn_5Select0[] = > + "^([ \t]*#[ \t]*if !__GNUC_PREREQ \\(7, 0\\) \\|\\| )defined > __cplusplus\n\ > +([ \t]*#[ \t]+error \"_Float128[xX] supported but no )"; > + > +#defineGLIBC_CXX_FLOATN_5_TEST_CT 1 > +static tTestDesc aGlibc_Cxx_Floatn_5Tests[] = { > + { TT_EGREP,zGlibc_Cxx_Floatn_5Select0, (regex_t*)NULL }, }; > + > +/* > + * Fix Command Arguments for Glibc_Cxx_Floatn_5 > + */ > +static const char* apzGlibc_Cxx_Floatn_5Patch[] = { > +"format", > +"%1(defined __cplusplus && !__GNUC_PREREQ (13, 0))\n\ > +%2", > +(char*)NULL }; > + > +/* * * * * * * * * * * * * * * * * * * * * * * * * *
[PATCH, rs6000] Fix addg6s builtin with long long parameters. (PR100693)
[PATCH, rs6000] Fix addg6s builtin with long long parameters. (PR100693) Hi, As reported in PR 100693, attempts to use __builtin_addg6s with long long arguments result in truncated results. Since the int and long long types can be coerced into each other, (documented further near the rs6000-c.cc change), this is handled by adding a builtin overload (ADDG6S_OV), and the addition of some special handling in altivec_resolve_overloaded_builtin() to map the calls to addg6s_32 or addg6s_64; similar to how the SCAL_CMPB builtins are currently handled. This has sniff-tested cleanly. I'm seeing a regression failure show up in testsuite/g++.dg/modules/adl-3*.c; which seems entirely unrelated to the content in this change. I'm poking at that a bit more to see if I can tell the what/why for that. OK for trunk? Thanks, -Will gcc/ PR target/100693 * config/rs6000/rs6000-builtins.def ([POWER7]): Replace bif-name __builtin_addg6s with bif-name __builtin_addg6s_32. ([POWER7-64]): New bif-name __builtin_addg6s_64. * config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin): Add handler mapping RS6000_OVLD_ADDG6S_OV to RS6000_BIF_ADDG6S and RS6000_BIF_ADDG6S_32. * config/rs6000/rs6000-overload.def (ADDG6S_OV): Add overloaded entry __builtin_addg6s mapped to ADDG6S_32 and ADDG6S. * config/rs6000/rs6000.md ("addg6s", UNSPEC_ADDG6S): Replace with ("addg6s3") and rework. * doc/extend.texi (__builtin_addg6s): Add documentation for __builtin_addg6s with unsigned long long parameters. gcc/testsuite/ * testsuite/gcc.target/powerpc/pr100693-compile.c: New. * testsuite/gcc.target/powerpc/pr100693-run.c: New. diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index f76f54793d73..11050e4c26d5 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -2010,12 +2010,13 @@ XXSPLTD_V2DI vsx_xxspltd_v2di {} ; Power7 builtins (ISA 2.06). [power7] - const unsigned int __builtin_addg6s (unsigned int, unsigned int); -ADDG6S addg6s {} + + const unsigned int __builtin_addg6s_32 (unsigned int, unsigned int); +ADDG6S_32 addg6ssi3 {} const signed long __builtin_bpermd (signed long, signed long); BPERMD bpermd_di {32bit} const unsigned int __builtin_cbcdtd (unsigned int); @@ -2041,10 +2042,14 @@ UNPACK_V1TI unpackv1ti {} ; Power7 builtins requiring 64-bit GPRs (even with 32-bit addressing). [power7-64] + + const unsigned long __builtin_addg6s_64 (unsigned long, unsigned long); +ADDG6S addg6sdi3 {no32bit} + const signed long long __builtin_divde (signed long long, signed long long); DIVDE dive_di {} const unsigned long long __builtin_divdeu (unsigned long long, \ unsigned long long); diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 566094626293..28e8b6761ce5 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -1919,10 +1919,35 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, instance_code, fcode, types, args); if (call != error_mark_node) return call; break; } + /* We need to special case __builtin_addg6s because the overloaded +forms of this function take (unsigned int, unsigned int) or +(unsigned long long, unsigned long long). Since C conventions +allow the respective argument types to be implicitly coerced into +each other, the default handling does not provide adequate +discrimination between the desired forms of the function. */ +case RS6000_OVLD_ADDG6S_OV: + { + machine_mode arg1_mode = TYPE_MODE (types[0]); + machine_mode arg2_mode = TYPE_MODE (types[1]); + + /* If any supplied arguments are wider than 32 bits, resolve to + 64-bit variant of built-in function. */ + if (GET_MODE_PRECISION (arg1_mode) > 32 + || GET_MODE_PRECISION (arg2_mode) > 32) + instance_code = RS6000_BIF_ADDG6S; + else + instance_code = RS6000_BIF_ADDG6S_32; + + tree call = find_instance (&unsupported_builtin, &instance, + instance_code, fcode, types, args); + if (call != error_mark_node) + return call; + break; + } case RS6000_OVLD_VEC_VSIE: { machine_mode arg1_mode = TYPE_MODE (types[0]); /* If supplied first argument is wider than 64 bits, resolve to diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 44e2945aaa0e..41b74c0c1500 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -193,10 +193,16 @@ unsigned int __builtin_cmpb (unsigned int, unsigned int); CMPB_32 unsigned long long __built
Re: [PATCH] Optimize signed DImode -> TImode on power10, PR target/104698
On Mon, 2022-02-28 at 22:21 -0500, Michael Meissner wrote: > Optimize signed DImode -> TImode on power10, PR target/104698. > Hi, Logic seems OK to me, a few suggestions on the comments intermixed below. As always, i defer if there are counter arguments. :-) > On power10, GCC tries to optimize the signed conversion from DImode to > TImode by using the vextsd2q instruction. However to generate this > instruction, it would have to generate 3 direct moves (1 from the GPR > registers to the altivec registers, and 2 from the altivec registers to > the GPR register). > > This patch adds code back in to use the shift right immediate instruction > to do the conversion if the target/source is GPR registers. Perhaps drop "back in". If it's necessary to call out a previous commit that removed the code for whatever reason, certainly do so. It's not clear from context if that was the case. > > 2022-02-28 Michael Meissner > > gcc/ > PR target/104698 > * config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete. > (extendditi2): Replace with code to deal with both GPR registers > and with altivec registers. Perhaps enhance with (extendditi2): Convert from define_expand to define_insn_and_split. Replace with code ... > --- > gcc/config/rs6000/vsx.md | 73 > 1 file changed, 52 insertions(+), 21 deletions(-) > > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index b53de103872..62464f67f4d 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -5023,15 +5023,58 @@ (define_expand "vsignextend_si_v2di" >DONE; > }) > > -;; ISA 3.1 vector sign extend > -;; Move DI value from GPR to TI mode in VSX register, word 1. > -(define_insn "mtvsrdd_diti_w1" > - [(set (match_operand:TI 0 "register_operand" "=wa") > - (unspec:TI [(match_operand:DI 1 "register_operand" "r")] > - UNSPEC_MTVSRD_DITI_W1))] > - "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" > - "mtvsrdd %x0,0,%1" > - [(set_attr "type" "vecmove")]) > +;; Sign extend DI to TI. We provide both GPR targets and Altivec targets. > If > +;; the register allocator prefers the GPRs, we won't have to move the value > to > +;; the altivec registers, do the vextsd2q instruction and move it back. If > we > +;; aren't compiling for 64-bit power10, don't provide the service and let the > +;; machine independent code handle the extension. So, the ".. we won't have to ..." applies to the altivec target path here? Describing in a way that indicates what code doesn't do doesn't seem right. If so, and perhaps even if not, i suggest rearranging the comment slightly so it can be read as an either or. If the register allocator prefers the GPRS, ... Otherwise, for altivec registers we dothe vextsd2q ... > +(define_insn_and_split "extendditi2" > + [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v") > + (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z"))) > + (clobber (reg:DI CA_REGNO))] > + "TARGET_POWERPC64 && TARGET_POWER10" > + "#" > + "&& reload_completed" > + [(pc)] > +{ > + rtx dest = operands[0]; > + rtx src = operands[1]; > + int dest_regno = reg_or_subregno (dest); > + > + /* Handle conversion to GPR registers. Load up the low part and then do > + a sign extension to the upper part. */ > + if (INT_REGNO_P (dest_regno)) > +{ > + rtx dest_hi = gen_highpart (DImode, dest); > + rtx dest_lo = gen_lowpart (DImode, dest); > + > + emit_move_insn (dest_lo, src); > + emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63))); > + DONE; > +} ok > + > + /* For conversion to Altivec register, generate either a splat operation or > + a load rightmost double word instruction. Both instructions gets the > + DImode value into the lower 64 bits, and then do the vextsd2q > + instruction. */ consider s/instruction. Both instructions gets/to get/ > + else if (ALTIVEC_REGNO_P (dest_regno)) > +{ > + if (MEM_P (src)) > + emit_insn (gen_vsx_lxvrdx (dest, src)); > + else > + { > + rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno); > + emit_insn (gen_vsx_splat_v2di (dest_v2di, src)); > + } > + > + emit_insn (gen_extendditi2_vector (dest, dest)); > + DONE; > +} ok lgtm, thanks -Will > + > + else > +gcc_unreachable (); > +} > + [(set_attr "length" "8")]) > > ;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg > (define_insn "extendditi2_vector" > @@ -5042,18 +5085,6 @@ (define_insn "extendditi2_vector" >"vextsd2q %0,%1" >[(set_attr "type" "vecexts")]) > > -(define_expand "extendditi2" > - [(set (match_operand:TI 0 "gpc_reg_operand") > - (sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))] > - "TARGET_POWER10" > - { > -/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits. > */ > -rtx temp = gen_reg_rtx (TImode); > -emit
Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059
On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote: > Eliminate power8 fusion options, use power8 tuning, PR target/102059 Hi, > > The power8 fusion support used to be set automatically when -mcpu=power8 or > -mtune=power8 was used, and it was cleared for other cpu's. However, if you > used the target attribute or target #pragma to change the default cpu type or > tuning, you would get an error that a target specifiction option mismatch > occurred. > specification. (ok :-) > This occurred because the rs6000_can_inline_p function just compares the ISA > bits between the called inline function and the caller. If the ISA flags of > the called function is not a subset of the ISA flags of the caller, we won't > do > the inlinging. When a power9 or power10 function inlines a function that is > explicitly compiled for power8, the power8 function has the power8 fusion bits > set and the power9 or power10 functions do not have the fusion bits set. > inlining. > This code removes the -mpower8-fusion and -mpower8-fusion-sign options, and > only enables power8 fusion if we are tuning for a power8. Power8 sign fusion > is only enabled if we are tuning for a power8 and we have -O3 optimization or > higher. > > I left the options -mno-power8-fusion and -mno-power8-fusion-sign in > rs6000.opt > and they don't issue a warning. If the user explicitly used -mpower8-fusion > or > -mpower8-fusion-sign, then they will get a warning that the swtich has been > removed. > switch > Similarly, I left in the pragma target and attribute target support for the > fusion options, but they don't do anything now. This is because I believe the > customer who encountered this problem now is explicitly setting the > no-power8-fusion option in the pragma or attribute to avoid the warning. > > I have tested this on the following systems, and they all bootstraps fine and > there were no regressions in the test suite: > > big endian power8 (both 64-bit and 32-bit) > little endian power9 > little endian power10 > ok. > Can I check this patch into the current master branch for GCC and after a > cooling period check in the patch to the GCC 11 and GCC 10 branches. The > customer is currently using GCC 10. > > 2022-03-09 Michael Meissner > > gcc/ > PR target/102059 > * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete. > (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks. > (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION. ok > * config/rs6000/rs6000.cc (rs6000_option_override_internal): > Delete code that set the power8 fusion options automatically. > (rs6000_opt_masks): Allow #pragma target and attribute target to set > power8-fusion and power8-fusion-sign, but these no longer represent > options that the user can set. > (rs6000_print_options_internal): Skip printing nop options. ok > * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro. > (TARGET_P8_FUSION_SIGN): Likewise. > (MASK_P8_FUSION): Delete. ok > * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but > ignore the no form and warn that the option was removed for the regular > form. > (-mpower8-fusion-sign): Likewise. ok > * doc/invoke.texi (RS/6000 and PowerPC Options): Delete -mpower8-fusion > and -mpower8-fusion-sign. This change removes the -mpower8-fusion and -mno-power8-fusion options, There is not a direct reference to -mpower8-fusion-sign in the change here. It may be an implied removal, but not immediately obvious to me. > > gcc/testsuite/ > PR target/102059 > * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion. > * gcc.dg/lto/pr102059-2_0.c: Likewise. > * gcc.target/powerpc/pr102059-3.c: Likewise. > * gcc.target/powerpc/pr102059-4.c: New test. ok > --- > gcc/config/rs6000/rs6000-cpus.def | 22 +++-- > gcc/config/rs6000/rs6000.cc | 49 +-- > gcc/config/rs6000/rs6000.h| 14 +- > gcc/config/rs6000/rs6000.opt | 19 +-- > gcc/doc/invoke.texi | 13 + > gcc/testsuite/gcc.dg/lto/pr102059-1_0.c | 2 +- > gcc/testsuite/gcc.dg/lto/pr102059-2_0.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr102059-3.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 + > 9 files changed, 75 insertions(+), 71 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c > > diff --git a/gcc/config/rs6000/rs6000-cpus.def > b/gcc/config/rs6000/rs6000-cpus.def > index 963947f6939..a05b2d8c41a 100644 > --- a/gcc/config/rs6000/rs6000-cpus.def > +++ b/gcc/config/rs6000/rs6000-cpus.def > @@ -43,9 +43,7 @@ >| OPTION_MASK_ALTIVEC \ >| OPTION_MASK_VSX) > > -/* For now, don't provide an embedded version of ISA 2.07. Do n
Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059
On Thu, 2022-03-10 at 13:49 -0600, Segher Boessenkool wrote: > On Thu, Mar 10, 2022 at 10:44:52AM -0600, will schmidt wrote: > > On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote: > > > --- a/gcc/config/rs6000/rs6000-cpus.def > > > +++ b/gcc/config/rs6000/rs6000-cpus.def > > > @@ -43,9 +43,7 @@ > > >| OPTION_MASK_ALTIVEC > > > \ > > >| OPTION_MASK_VSX) > > > > > > -/* For now, don't provide an embedded version of ISA 2.07. Do > > > not set power8 > > > - fusion here, instead set it in rs6000.cc if we are tuning for > > > a power8 > > > - system. */ > > > +/* For now, don't provide an embedded version of ISA 2.07. */ > > > > ok. (as far as removing the comment, I'm not clear what the > > remaining > > comment is telling me, but thats outside of the scope of this > > patch). > > It is saying there is nothing that implements Book III-E of ISA 2.07 > (nothing in GCC, but no actual CPU either). Or Category: Embedded > even > maybe :-) Lol, Ok. The small-e in embedded did not clue me in that this was referring to the big-E Embedded category. :-) > It could be clearer perhaps, or just be removed completely; it might > have been useful historically, but it isn't anymore really. THanks, -Will > > > Segher
rs6000: RFC/Update support for addg6s instruction. PR100693
Hi, RFC/Update support for addg6s instruction. PR100693 For PR100693, we currently provide an addg6s builtin using unsigned int arguments, but we are missing an unsigned long long argument equivalent. This patch adds an overload to provide the long long version of the builtin. unsigned long long __builtin_addg6s (unsigned long long, unsigned long long); RFC/concerns: This patch works, but looking briefly at intermediate stages is not behaving quite as I expected. Looking at the intermediate dumps, I see in pr100693.original that calls I expect to be routed to the internal __builtin_addg6s_si() that uses (unsigned int) arguments are instead being handled by __builtin_addg6s_di() with casts that convert the arguments to (unsigned long long). i.e. return (unsigned int) __builtin_addg6s_di ((long long unsigned int) a, (long long unsigned int) b); As a test, I see if I swap the order of the builtins in rs6000-overload.def I end up with code casting the ULL values to UI, which provides truncated results, and is similar to what occurs today without this patch. All that said, this patch seems to work. OK for next stage 1? Tested on power8BE as well as LE power8,power9,power10. 2022-03-15 Will Schmidt gcc/ PR target/100693 * config/rs6000/rs600-builtins.def: Remove entry for __builtin_addgs() and add entries for __builtin_addg6s_di() and __builtin_addg6s_si(). * config/rs6000/rs6000-overload.def: Add overloaded entries allowing __builtin_addg6s() to map to either of the __builtin_addg6s_{di,si} builtins. * config/rs6000/rs6000.md: Add UNSPEC_ADDG6S_SI and UNSPEC_ADDG6S_DI unspecs. Add define_insn entries for addg6s_si and addg6s_di based on those unspecs. * doc/extend.texi: Add entry for ULL __builtin_addg6s (ULL, ULL); testsuite/ PR target/100693 * gcc.target/powerpc/pr100693.c: New test. diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index ae2760c33389..4c23cac26932 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1993,12 +1993,16 @@ XXSPLTD_V2DI vsx_xxspltd_v2di {} ; Power7 builtins (ISA 2.06). [power7] - const unsigned int __builtin_addg6s (unsigned int, unsigned int); -ADDG6S addg6s {} + const unsigned long long __builtin_addg6s_di (unsigned long long, \ + unsigned long long); +ADDG6S_DI addg6s_di {} + + const unsigned int __builtin_addg6s_si (unsigned int, unsigned int); +ADDG6S_SI addg6s_si {} const signed long __builtin_bpermd (signed long, signed long); BPERMD bpermd_di {32bit} const unsigned int __builtin_cbcdtd (unsigned int); diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 44e2945aaa0e..931f85b738c5 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -76,10 +76,15 @@ ; Blank lines may be used as desired in this file between the lines as ; defined above; that is, you can introduce as many extra newlines as you ; like after a required newline, but nowhere else. Lines beginning with ; a semicolon are also treated as blank lines. +[ADDG6S, __builtin_i_addg6s, __builtin_addg6s] + unsigned long long __builtin_addg6s_di (signed long long, unsigned long long); +ADDG6S_DI + unsigned int __builtin_addg6s_si (unsigned int, unsigned int); +ADDG6S_SI [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd] vsq __builtin_vec_bcdadd (vsq, vsq, const int); BCDADD_V1TI vuc __builtin_vec_bcdadd (vuc, vuc, const int); diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index fdfbc6566a5c..d040f127eb55 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -122,11 +122,12 @@ (define_c_enum "unspec" UNSPEC_P8V_MTVSRWZ UNSPEC_P8V_RELOAD_FROM_GPR UNSPEC_P8V_MTVSRD UNSPEC_P8V_XXPERMDI UNSPEC_P8V_RELOAD_FROM_VSX - UNSPEC_ADDG6S + UNSPEC_ADDG6S_SI + UNSPEC_ADDG6S_DI UNSPEC_CDTBCD UNSPEC_CBCDTD UNSPEC_DIVE UNSPEC_DIVEU UNSPEC_UNPACK_128BIT @@ -14495,15 +14496,24 @@ (define_peephole2 operands[5] = change_address (mem, mode, new_addr); }) ;; Miscellaneous ISA 2.06 (power7) instructions -(define_insn "addg6s" +(define_insn "addg6s_si" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand:SI 1 "register_operand" "r") (match_operand:SI 2 "register_operand" "r")] - UNSPEC_ADDG6S))] + UNSPEC_ADDG6S_SI))] + "TARGET_POPCNTD" + "addg6s %0,%1,%2" + [(set_attr "type" "integer")]) + +(define_insn "addg6s_di" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec:DI [(match_operand:DI 1 "register_operand" "r") + (match_operand:DI 2 "register_operand" "r")] + UNSPEC_
Re: rs6000: RFC/Update support for addg6s instruction. PR100693
On Wed, 2022-03-16 at 13:12 -0500, Segher Boessenkool wrote: > Hi! > > On Wed, Mar 16, 2022 at 12:20:18PM -0500, will schmidt wrote: > > For PR100693, we currently provide an addg6s builtin using unsigned > > int arguments, but we are missing an unsigned long long argument > > equivalent. This patch adds an overload to provide the long long > > version of the builtin. > > > > unsigned long long __builtin_addg6s (unsigned long long, unsigned > > long long); > > > > RFC/concerns: This patch works, but looking briefly at intermediate > > stages > > is not behaving quite as I expected. Looking at the intermediate > > dumps, I > > see in pr100693.original that calls I expect to be routed to the > > internal > > __builtin_addg6s_si() that uses (unsigned int) arguments are > > instead being > > handled by __builtin_addg6s_di() with casts that convert the > > arguments to > > (unsigned long long). > > Did you test with actual 32-bit variables, instead of just function > arguments? Function arguments are always passed in (sign-extended) > registers. > > Like, > > unsigned int f(unsigned int *a, unsigned int *b) > { > return __builtin_addg6s(*a, *b); > } I perhaps missed that subtlety. I'll investigate that further. > > > As a test, I see if I swap the order of the builtins in rs6000- > > overload.def > > I end up with code casting the ULL values to UI, which provides > > truncated > > results, and is similar to what occurs today without this patch. > > > > All that said, this patch seems to work. OK for next stage 1? > > Tested on power8BE as well as LE power8,power9,power10. > > Please ask again when stage 1 has started? > > > gcc/ > > PR target/100693 > > * config/rs6000/rs600-builtins.def: Remove entry for > > __builtin_addgs() > > and add entries for __builtin_addg6s_di() and > > __builtin_addg6s_si(). > > Indent of second and further lines should be at the "*", not two > spaces > after that. > > > - UNSPEC_ADDG6S > > + UNSPEC_ADDG6S_SI > > + UNSPEC_ADDG6S_DI > > You do not need multiple unspec numbers. You can differentiate them > based on the modes of the arguments, already :-) > > > ;; Miscellaneous ISA 2.06 (power7) instructions > > -(define_insn "addg6s" > > +(define_insn "addg6s_si" > >[(set (match_operand:SI 0 "register_operand" "=r") > > (unspec:SI [(match_operand:SI 1 "register_operand" "r") > > (match_operand:SI 2 "register_operand" "r")] > > - UNSPEC_ADDG6S))] > > + UNSPEC_ADDG6S_SI))] > > + "TARGET_POPCNTD" > > + "addg6s %0,%1,%2" > > + [(set_attr "type" "integer")]) > > + > > +(define_insn "addg6s_di" > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (unspec:DI [(match_operand:DI 1 "register_operand" "r") > > + (match_operand:DI 2 "register_operand" "r")] > > + UNSPEC_ADDG6S_DI))] > >"TARGET_POPCNTD" > >"addg6s %0,%1,%2" > >[(set_attr "type" "integer")]) > > (define_insn "addg6s" > [(set (match_operand:GPR 0 "register_operand" "=r") > (unspec:GPR [(match_operand:GPR 1 "register_operand" "r") >(match_operand:GPR 2 "register_operand" "r")] > UNSPEC_ADDG6S))] > "TARGET_POPCNTD" > "addg6s %0,%1,%2" > [(set_attr "type" "integer")]) > You do not need multiple unspec numbers. You can differentiate them > based on the modes of the arguments, already :-) Yeah, Thats what I thought, which is a big part of why I posted this with RFC. :-)When I attempted this there was an issue with multiple s (behind the GPR predicate) versus the singular "addg6s" define_insn. It's possible I had something else wrong there, but I'll go back to that attempt and work in that direction. > > We do not want DI (here, and in most places) for -m32! > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/pr100693.c > > @@ -0,0 +1,68 @@ > > +/* { dg-do compile { target { powerpc*-*-linux* } } } */ > > Why only on Linux? > > > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > > Why not on Darwin? And why skip it anyway, given the previous line > :-) > > > +/* { dg-require-effective-target powerpc_vsx_ok } */ > > That is the wrong requirement. You want to test for Power7, not for > VSX. I realise you probably copied this from elsewhere :-( (If from > another addg6s testcase, just keep it). Because reasons. :-) The stanzas are copied from the nearby bcd-1.c testcase that has a simpler test for addg6s.Given the input I'll try to correct the stanzas here and limit how much error I carry along. Thanks for the feedback and review. I'll investigate further, and resubmit at stage1. Thanks, -Will > > > Segher
Re: [PATCHv2, rs6000] Add V1TI into vector comparison expand [PR103316]
On Thu, 2022-03-17 at 13:35 +0800, HAO CHEN GUI via Gcc-patches wrote: > Hi, >This patch adds V1TI mode into a new mode iterator used in vector > comparison expands.With the patch, both built-ins and direct > comparison > could generate P10 new V1TI comparison instructions. Hi, -/* We deliberately omit RS6000_BIF_CMPGE_1TI ... - for now, because gimple folding produces worse code for 128-bit - compares. */ I assume it is the case, but don't see a before/after example to clarify the situation. A clear statement that the 'worse code' situation has been resolved with this addition of TI modes into the iterators, would be good. Otherwise lgtm. :-) Thanks, -Will > >Bootstrapped and tested on ppc64 Linux BE and LE with no > regressions. > Is this okay for trunk? Any recommendations? Thanks a lot. > > ChangeLog > 2022-03-16 Haochen Gui > > gcc/ > PR target/103316 > * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable > gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET, > RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT, > RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI. > * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10 > V1TI instructions. > (vec_cmp): Set mode iterator to VEC_IC. > (vec_cmpu): Likewise. > > gcc/testsuite/ > PR target/103316 > * gcc.target/powerpc/pr103316.c: New. > * gcc.target/powerpc/fold-vec-cmp-int128.c: New cases for vector > __int128. > > patch.diff > diff --git a/gcc/config/rs6000/rs6000-builtin.cc > b/gcc/config/rs6000/rs6000-builtin.cc > index 5d34c1bcfc9..fac7f43f438 100644 > --- a/gcc/config/rs6000/rs6000-builtin.cc > +++ b/gcc/config/rs6000/rs6000-builtin.cc > @@ -1994,16 +1994,14 @@ rs6000_gimple_fold_builtin > (gimple_stmt_iterator *gsi) > case RS6000_BIF_VCMPEQUH: > case RS6000_BIF_VCMPEQUW: > case RS6000_BIF_VCMPEQUD: > -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because > gimple > - folding produces worse code for 128-bit compares. */ > +case RS6000_BIF_VCMPEQUT: >fold_compare_helper (gsi, EQ_EXPR, stmt); >return true; > > case RS6000_BIF_VCMPNEB: > case RS6000_BIF_VCMPNEH: > case RS6000_BIF_VCMPNEW: > -/* We deliberately omit RS6000_BIF_VCMPNET for now, because > gimple > - folding produces worse code for 128-bit compares. */ > +case RS6000_BIF_VCMPNET: >fold_compare_helper (gsi, NE_EXPR, stmt); >return true; > > @@ -2015,9 +2013,8 @@ rs6000_gimple_fold_builtin > (gimple_stmt_iterator *gsi) > case RS6000_BIF_CMPGE_U4SI: > case RS6000_BIF_CMPGE_2DI: > case RS6000_BIF_CMPGE_U2DI: > -/* We deliberately omit RS6000_BIF_CMPGE_1TI and > RS6000_BIF_CMPGE_U1TI > - for now, because gimple folding produces worse code for 128- > bit > - compares. */ > +case RS6000_BIF_CMPGE_1TI: > +case RS6000_BIF_CMPGE_U1TI: >fold_compare_helper (gsi, GE_EXPR, stmt); >return true; > > @@ -2029,9 +2026,8 @@ rs6000_gimple_fold_builtin > (gimple_stmt_iterator *gsi) > case RS6000_BIF_VCMPGTUW: > case RS6000_BIF_VCMPGTUD: > case RS6000_BIF_VCMPGTSD: > -/* We deliberately omit RS6000_BIF_VCMPGTUT and > RS6000_BIF_VCMPGTST > - for now, because gimple folding produces worse code for 128- > bit > - compares. */ > +case RS6000_BIF_VCMPGTUT: > +case RS6000_BIF_VCMPGTST: >fold_compare_helper (gsi, GT_EXPR, stmt); >return true; > > @@ -2043,9 +2039,8 @@ rs6000_gimple_fold_builtin > (gimple_stmt_iterator *gsi) > case RS6000_BIF_CMPLE_U4SI: > case RS6000_BIF_CMPLE_2DI: > case RS6000_BIF_CMPLE_U2DI: > -/* We deliberately omit RS6000_BIF_CMPLE_1TI and > RS6000_BIF_CMPLE_U1TI > - for now, because gimple folding produces worse code for 128- > bit > - compares. */ > +case RS6000_BIF_CMPLE_1TI: > +case RS6000_BIF_CMPLE_U1TI: >fold_compare_helper (gsi, LE_EXPR, stmt); >return true; > > diff --git a/gcc/config/rs6000/vector.md > b/gcc/config/rs6000/vector.md > index b87a742cca8..d88869cc8d0 100644 > --- a/gcc/config/rs6000/vector.md > +++ b/gcc/config/rs6000/vector.md > @@ -26,6 +26,9 @@ > ;; Vector int modes > (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI]) > > +;; Vector int modes for comparison > +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI > "TARGET_POWER10")]) > + > ;; 128-bit int modes > (define_mode_iterator VEC_TI [V1TI TI]) > > @@ -533,10 +536,10 @@ (define_expand "vcond_mask_" > > ;; For signed integer vectors comparison. > (define_expand "vec_cmp" > - [(set (match_operand:VEC_I 0 "vint_operand") > + [(set (match_operand:VEC_IC 0 "vint_operand") > (match_operator 1 "signed_or_equality_comparison_operator" > - [(match_operand:VEC_I 2 "vint_operand") > -(match_operand:VEC_I 3 "vint_operand")]))] > +
Re: [PATCH v3, rs6000] Add V1TI into vector comparison expand [PR103316]
On Mon, 2022-03-21 at 09:51 +0800, HAO CHEN GUI wrote: > Hi, >This patch adds V1TI mode into a new mode iterator used in vector > comparison expands.Without the patch, the comparisons between two vector > __int128 are converted to scalar comparisons with branches. The code is > suboptimal.The patch fixes the issue. Now all comparisons between two > vector __int128 generates P10 new comparison instructions. Also the > relative built-ins generate the same instructions after gimple folding. > So they're added back to the list. > Hi, Thanks for reworking the description, this clears up my uncertainty. :-) A few spots where spaces should be added after periods. No need to re-post for just that. Patch content otherwise seems OK to me, though I defer to others for any subtleties with actual VEC_IC related changes, Thanks -Will >Bootstrapped and tested on ppc64 Linux BE and LE with no regressions. > Is this okay for trunk? Any recommendations? Thanks a lot. > > ChangeLog > 2022-03-16 Haochen Gui > > gcc/ > PR target/103316 > * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable > gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET, > RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT, > RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI. > * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10 > V1TI instructions. > (vec_cmp): Set mode iterator to VEC_IC. > (vec_cmpu): Likewise. > > gcc/testsuite/ > PR target/103316 > * gcc.target/powerpc/pr103316.c: New. > * gcc.target/powerpc/fold-vec-cmp-int128.c: New cases for vector > __int128. > > patch.diff > diff --git a/gcc/config/rs6000/rs6000-builtin.cc > b/gcc/config/rs6000/rs6000-builtin.cc > index 5d34c1bcfc9..fac7f43f438 100644 > --- a/gcc/config/rs6000/rs6000-builtin.cc > +++ b/gcc/config/rs6000/rs6000-builtin.cc > @@ -1994,16 +1994,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) > case RS6000_BIF_VCMPEQUH: > case RS6000_BIF_VCMPEQUW: > case RS6000_BIF_VCMPEQUD: > -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple > - folding produces worse code for 128-bit compares. */ > +case RS6000_BIF_VCMPEQUT: >fold_compare_helper (gsi, EQ_EXPR, stmt); >return true; > > case RS6000_BIF_VCMPNEB: > case RS6000_BIF_VCMPNEH: > case RS6000_BIF_VCMPNEW: > -/* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple > - folding produces worse code for 128-bit compares. */ > +case RS6000_BIF_VCMPNET: >fold_compare_helper (gsi, NE_EXPR, stmt); >return true; > > @@ -2015,9 +2013,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) > case RS6000_BIF_CMPGE_U4SI: > case RS6000_BIF_CMPGE_2DI: > case RS6000_BIF_CMPGE_U2DI: > -/* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI > - for now, because gimple folding produces worse code for 128-bit > - compares. */ > +case RS6000_BIF_CMPGE_1TI: > +case RS6000_BIF_CMPGE_U1TI: >fold_compare_helper (gsi, GE_EXPR, stmt); >return true; > > @@ -2029,9 +2026,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) > case RS6000_BIF_VCMPGTUW: > case RS6000_BIF_VCMPGTUD: > case RS6000_BIF_VCMPGTSD: > -/* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST > - for now, because gimple folding produces worse code for 128-bit > - compares. */ > +case RS6000_BIF_VCMPGTUT: > +case RS6000_BIF_VCMPGTST: >fold_compare_helper (gsi, GT_EXPR, stmt); >return true; > > @@ -2043,9 +2039,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) > case RS6000_BIF_CMPLE_U4SI: > case RS6000_BIF_CMPLE_2DI: > case RS6000_BIF_CMPLE_U2DI: > -/* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI > - for now, because gimple folding produces worse code for 128-bit > - compares. */ > +case RS6000_BIF_CMPLE_1TI: > +case RS6000_BIF_CMPLE_U1TI: >fold_compare_helper (gsi, LE_EXPR, stmt); >return true; > > diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md > index b87a742cca8..d88869cc8d0 100644 > --- a/gcc/config/rs6000/vector.md > +++ b/gcc/config/rs6000/vector.md > @@ -26,6 +26,9 @@ > ;; Vector int modes > (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI]) > > +;; Vector int modes for comparison > +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI "TARGET_POWER10")]) > + > ;; 128-bit int modes > (define_mode_iterator VEC_TI [V1TI TI]) > > @@ -533,10 +536,10 @@ (define_expand "vcond_mask_" > > ;; For signed integer vectors comparison. > (define_expand "vec_cmp" > - [(set (match_operand:VEC_I 0 "vint_operand") > + [(set (match_operand:VEC_IC 0 "vint_operand") > (match_operator 1 "signed_or_equality_comparison_operator"
Re: [PATCH 8/8] rs6000: Fix some missing built-in attributes [PR104004]
On Fri, 2022-01-28 at 11:50 -0600, Bill Schmidt via Gcc-patches wrote: > PR104004 caught some misses on my part in converting to the new > built-in > function infrastructure. In particular, I forgot to mark all of the > "nosoft" > built-ins, and one of those should also have been marked "no32bit". > > Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. > Is this okay for trunk? > > Thanks, > Bill > Hi, The patch here seems reasonable to me. There are comments/subsequent pings that include commentary about additional test coverage. I see all of the builtins referenced here appear to be touched by the existing test gcc.target/powerpc/test_fpscr_drn_builtin.c . I could create a variation of that test forcing ! hard_dfp in case that would help, though i'm uncertain the value there. Thanks -Will > > 2022-01-27 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-builtin.def (MFFSL): Mark nosoft. > (MTFSB0): Likewise. > (MTFSB1): Likewise. > (SET_FPSCR_RN): Likewise. > (SET_FPSCR_DRN): Mark nosoft and no32bit. > --- > gcc/config/rs6000/rs6000-builtins.def | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index c8f0cf332eb..98619a649e3 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -215,7 +215,7 @@ > ; processors, this builtin automatically falls back to mffs on older > ; platforms. Thus it appears here in the [always] stanza. >double __builtin_mffsl (); > -MFFSL rs6000_mffsl {} > +MFFSL rs6000_mffsl {nosoft} > > ; This is redundant with __builtin_pack_ibm128, as it requires long > ; double to be __ibm128. Should probably be deprecated. > @@ -226,10 +226,10 @@ > MFTB rs6000_mftb_di {32bit} > >void __builtin_mtfsb0 (const int<0,31>); > -MTFSB0 rs6000_mtfsb0 {} > +MTFSB0 rs6000_mtfsb0 {nosoft} > >void __builtin_mtfsb1 (const int<0,31>); > -MTFSB1 rs6000_mtfsb1 {} > +MTFSB1 rs6000_mtfsb1 {nosoft} > >void __builtin_mtfsf (const int<0,255>, double); > MTFSF rs6000_mtfsf {} > @@ -238,7 +238,7 @@ > PACK_IF packif {} > >void __builtin_set_fpscr_rn (const int[0,3]); > -SET_FPSCR_RN rs6000_set_fpscr_rn {} > +SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft} > >const double __builtin_unpack_ibm128 (__ibm128, const int<0,1>); > UNPACK_IF unpackif {} > @@ -2969,7 +2969,7 @@ > PACK_TD packtd {} > >void __builtin_set_fpscr_drn (const int[0,7]); > -SET_FPSCR_DRN rs6000_set_fpscr_drn {} > +SET_FPSCR_DRN rs6000_set_fpscr_drn {nosoft,no32bit} > >const unsigned long long __builtin_unpack_dec128 (_Decimal128, \ > const int<0,1>);
Re: [PATCH] rs6000: Adjust mov optabs for opaque modes [PR103353]
On Thu, 2022-03-03 at 16:38 +0800, Kewen.Lin via Gcc-patches wrote: > Hi, > Hi > As PR103353 shows, we may want to continue to expand a MMA built-in > function like a normal function, even if we have already emitted > error messages about some missing required conditions. As shown in > that PR, without one explicit mov optab on OOmode provided, it would > call emit_move_insn recursively. > > So this patch is to allow the mov pattern to be generated when we are > expanding to RTL and have seen errors even without MMA supported, it's > expected that the generated pattern would not cause further ICEs as the > compilation would stop soon after expanding. Is there a testcase, new or existing, that illustrates this error path? > > Bootstrapped and regtested on powerpc64-linux-gnu P8 and > powerpc64le-linux-gnu P9 and P10. > > Is it ok for trunk? > > BR, > Kewen > -- > > PR target/103353 > > gcc/ChangeLog: > > * config/rs6000/mma.md (define_expand movoo): Move TARGET_MMA condition > check to preparation statements and add handlings for !TARGET_MMA. > (define_expand movxo): Likewise. > > --- > > gcc/config/rs6000/mma.md | 42 ++-- > > 1 file changed, 36 insertions(+), 6 deletions(-) > > > > diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md > > index 907c9d6d516..f76a87b4a21 100644 > > --- a/gcc/config/rs6000/mma.md > > +++ b/gcc/config/rs6000/mma.md > > @@ -268,10 +268,25 @@ (define_int_attr avvi4i4i4 > > [(UNSPEC_MMA_PMXVI8GER4PP "pmxvi8ger4pp") > > (define_expand "movoo" > >[(set (match_operand:OO 0 "nonimmediate_operand") > > (match_operand:OO 1 "input_operand"))] > > - "TARGET_MMA" > > + "" > > { > > - rs6000_emit_move (operands[0], operands[1], OOmode); > > - DONE; > > + if (TARGET_MMA) { > > +rs6000_emit_move (operands[0], operands[1], OOmode); > > +DONE; > > + } > > + /* Opaque modes are only expected to be available when MMA is supported, > > + but PR103353 shows we may want to continue to expand a MMA built-in > > + function like a normal function, even if we have already emitted > > + error messages about some missing required conditions. perhaps drop "like a normal function". > > + As shown in that PR, without one explicit mov optab on OOmode > > provided, > > + it would call emit_move_insn recursively. So we allow this pattern to > > + be generated when we are expanding to RTL and have seen errors, even > > + though there is no MMA support. It would not cause further ICEs as > > + the compilation would stop soon after expanding. */ Testcase would be particularly helpful to illustrate this, i think. TH anks, -Will > > + else if (currently_expanding_to_rtl && seen_error ()) > > +; > > + else > > +gcc_unreachable (); > > }) > > > > (define_insn_and_split "*movoo" > > @@ -300,10 +315,25 @@ (define_insn_and_split "*movoo" > > (define_expand "movxo" > >[(set (match_operand:XO 0 "nonimmediate_operand") > > (match_operand:XO 1 "input_operand"))] > > - "TARGET_MMA" > > + "" > > { > > - rs6000_emit_move (operands[0], operands[1], XOmode); > > - DONE; > > + if (TARGET_MMA) { > > +rs6000_emit_move (operands[0], operands[1], XOmode); > > +DONE; > > + } > > + /* Opaque modes are only expected to be available when MMA is supported, > > + but PR103353 shows we may want to continue to expand a MMA built-in > > + function like a normal function, even if we have already emitted > > + error messages about some missing required conditions. > > + As shown in that PR, without one explicit mov optab on OOmode > > provided, > > + it would call emit_move_insn recursively. So we allow this pattern to > > + be generated when we are expanding to RTL and have seen errors, even > > + though there is no MMA support. It would not cause further ICEs as > > + the compilation would stop soon after expanding. */ > > + else if (currently_expanding_to_rtl && seen_error ()) > > +; > > + else > > +gcc_unreachable (); > > }) > > > > (define_insn_and_split "*movxo" > > -- > > 2.25.1 > >
Re: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation.
On Wed, 2022-04-06 at 14:21 -0400, Michael Meissner wrote: > From bf51c49f1481001c7b3223474d261dcbf9365eda Mon Sep 17 00:00:00 2001 > From: Michael Meissner > Date: Fri, 1 Apr 2022 22:27:13 -0400 > Subject: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation. > Hi, > This pattern adds zero_extendditi2 so that if we are extending DImode to > TImode, and we want the result in a vector register, the compiler can > generate MTVSRDDD. > > In addition the patterns for generating lxvr{b,h,w,d}x were tuned to allow > loading to gpr registers. This prevents needlessly doing direct moves to > get the value into the vector registers if the gpr register was already > selected. ok > > In updating the insn counts for two tests due to these changes, I noticed > the tests were done at -O0. I changed this so that the tests are now done > at the normal -O2 optimization level. Per the comments (which you fixed up later in patch), I note they were deliberately done at -O0 since under higher optimizations gcc would generate other load instructions during those tests. Presumably with these changes that is no longer the case. :-) > > I have tested this patch with bootstrap builds and running the regression > testsuite using this patch on: > > Little endian power10, --with-cpu=power10 > Little endian power9, --with-cpu=power9 > Big endian power8, --with-cpu=power8 (both 64/32-bit tests done). > > There were no regressions. Can I check this into the master branch? > > 2022-04-06 Michael Meissner > > gcc/ > * config/rs6000/vsx.md (vsx_lxvrx): Add support for loading to > GPR registers. > (vsx_stxvrx): Add support for storing from GPR registers. > (zero_extendditi2): New insn. > > gcc/testsuite/ > * gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2 > instead of -O0 and update insn counts. > * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise. > * gcc.target/powerpc/zero-extend-di-ti.c: New test. > > --- > gcc/config/rs6000/vsx.md | 82 +-- > .../powerpc/vsx-load-element-extend-int.c | 36 > .../powerpc/vsx-load-element-extend-short.c | 35 > .../gcc.target/powerpc/zero-extend-di-ti.c| 62 ++ > 4 files changed, 164 insertions(+), 51 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c > > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index c091e5e2f47..ad971e3a1de 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -1315,14 +1315,32 @@ (define_expand "vsx_store_" > } > }) > > -;; Load rightmost element from load_data > -;; using lxvrbx, lxvrhx, lxvrwx, lxvrdx. > -(define_insn "vsx_lxvrx" > - [(set (match_operand:TI 0 "vsx_register_operand" "=wa") > - (zero_extend:TI (match_operand:INT_ISA3 1 "memory_operand" "Z")))] > - "TARGET_POWER10" > - "lxvrx %x0,%y1" > - [(set_attr "type" "vecload")]) > +;; Load rightmost element from load_data using lxvrbx, lxvrhx, lxvrwx, > lxvrdx. > +;; Support TImode being in a GPR register to prevent generating lvxr{d,w,b}x > +;; and then two direct moves if we ultimately need the value in a GPR > register. > +(define_insn_and_split "vsx_lxvrx" > + [(set (match_operand:TI 0 "register_operand" "=r,wa") > + (zero_extend:TI (match_operand:INT_ISA3 1 "memory_operand" "m,Z")))] > + "TARGET_POWERPC64 && TARGET_POWER10" > + "@ > + # > + lxvrx %x0,%y1" > + "&& reload_completed && int_reg_operand (operands[0], TImode)" > + [(set (match_dup 2) (match_dup 3)) > + (set (match_dup 4) (const_int 0))] > +{ > + rtx op0 = operands[0]; > + rtx op1 = operands[1]; > + > + operands[2] = gen_lowpart (DImode, op0); > + operands[3] = (mode == DImode > + ? op1 > + : gen_rtx_ZERO_EXTEND (DImode, op1)); > + > + operands[4] = gen_highpart (DImode, op0); > +} > + [(set_attr "type" "load,vecload") > + (set_attr "num_insns" "2,*")]) > > ;; Store rightmost element into store_data > ;; using stxvrbx, stxvrhx, strvxwx, strvxdx. > @@ -5019,6 +5037,54 @@ (define_expand "vsignextend_si_v2di" >DONE; > }) > > +;; Zero extend DI to TI. If we don't have the MTVSRDD instruction (and > LXVRDX > +;; in the case of power10), we use the machine independent code. If we are > +;; loading up GPRs, we fall back to the old code. In this context it's not clear what is the "old code" ? The mtvsrdd instruction is referenced in this code path. I see no direct reference to lxvrdx here, though I suppose it's assumed somewhere behind the emit_ calls. > +(define_insn_and_split "zero_extendditi2" > + [(set (match_operand:TI 0 "register_operand" "=r,r, > wa,&wa") > + (zero_extend:TI (match_operand:DI 1 "register_operand" "r,wa,r, > wa")))] > + "TARGET_POWERPC64 && TARGET_P9_VECTOR" > + "@ > + # > + # > + mtvsrdd %x0,0,%1 > + #" > + "&& reload_completed > + && (int_reg_op
Re: [PATCH v2] rs6000: Adjust mov optabs for opaque modes [PR103353]
On Thu, 2022-04-07 at 17:29 +0800, Kewen.Lin wrote: > Hi, > > As PR103353 shows, we may want to continue to expand a MMA built-in > function like a normal function, even if we have already emitted > error messages about some missing required conditions. As shown in > that PR, without one explicit mov optab on OOmode provided, it would > call emit_move_insn recursively. > > So this patch is to allow the mov pattern to be generated when we are > expanding to RTL and have seen errors even without MMA supported, it's > expected that the generated pattern would not cause further ICEs as the > compilation would stop soon after expanding. > > Bootstrapped and regtested on powerpc64-linux-gnu P8 and > powerpc64le-linux-gnu P9 and P10. > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591150.html > > v2: Polish some comments and add one test case as Will and Peter suggested. Thanks. > > Is it ok for trunk or upcoming stage1? > > BR, > Kewen > -- > > PR target/103353 > > gcc/ChangeLog: > > * config/rs6000/mma.md (define_expand movoo): Move TARGET_MMA condition > check to preparation statements and add handlings for !TARGET_MMA. > (define_expand movxo): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr103353.c: New test. > --- > gcc/config/rs6000/mma.md| 42 ++--- > gcc/testsuite/gcc.target/powerpc/pr103353.c | 22 +++ > 2 files changed, 58 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103353.c > > diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md > index 907c9d6d516..746a77a0957 100644 > --- a/gcc/config/rs6000/mma.md > +++ b/gcc/config/rs6000/mma.md > @@ -268,10 +268,25 @@ (define_int_attr avvi4i4i4 > [(UNSPEC_MMA_PMXVI8GER4PP "pmxvi8ger4pp") > (define_expand "movoo" >[(set (match_operand:OO 0 "nonimmediate_operand") > (match_operand:OO 1 "input_operand"))] > - "TARGET_MMA" > + "" > { > - rs6000_emit_move (operands[0], operands[1], OOmode); > - DONE; > + if (TARGET_MMA) { > +rs6000_emit_move (operands[0], operands[1], OOmode); > +DONE; > + } > + /* Opaque modes are only expected to be available when MMA is supported, > + but PR103353 shows we may want to continue to expand a MMA built-in > + function, even if we have already emitted error messages about some > + missing required conditions. As shown in that PR, without one > + explicit mov optab on OOmode provided, it would call emit_move_insn > + recursively. So we allow this pattern to be generated when we are > + expanding to RTL and have seen errors, even though there is no MMA > + support. It would not cause further ICEs as the compilation would > + stop soon after expanding. */ > + else if (currently_expanding_to_rtl && seen_error ()) > +; > + else > +gcc_unreachable (); > }) ok > > (define_insn_and_split "*movoo" > @@ -300,10 +315,25 @@ (define_insn_and_split "*movoo" > (define_expand "movxo" >[(set (match_operand:XO 0 "nonimmediate_operand") > (match_operand:XO 1 "input_operand"))] > - "TARGET_MMA" > + "" > { > - rs6000_emit_move (operands[0], operands[1], XOmode); > - DONE; > + if (TARGET_MMA) { > +rs6000_emit_move (operands[0], operands[1], XOmode); > +DONE; > + } > + /* Opaque modes are only expected to be available when MMA is supported, > + but PR103353 shows we may want to continue to expand a MMA built-in > + function, even if we have already emitted error messages about some > + missing required conditions. As shown in that PR, without one > + explicit mov optab on XOmode provided, it would call emit_move_insn > + recursively. So we allow this pattern to be generated when we are > + expanding to RTL and have seen errors, even though there is no MMA > + support. It would not cause further ICEs as the compilation would > + stop soon after expanding. */ > + else if (currently_expanding_to_rtl && seen_error ()) > +; > + else > +gcc_unreachable (); > }) ok > > (define_insn_and_split "*movxo" > diff --git a/gcc/testsuite/gcc.target/powerpc/pr103353.c > b/gcc/testsuite/gcc.target/powerpc/pr103353.c > new file mode 100644 > index 000..6b0bedbb958 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr103353.c > @@ -0,0 +1,22 @@ > +/* { dg-require-effective-target powerpc_altivec_ok } */ > +/* If the default cpu type is power10 or later, MMA is enabled by default. > + To keep the test point available all the time, this case specifies > + -mdejagnu-cpu=power6 to make it be tested without MMA. */ > +/* { dg-options "-maltivec -mdejagnu-cpu=power6" } */ > + > +/* Verify there is no ICE and don't check the error messages on MMA > + requirement since they could be fragile and are not test points > + of this case. */ > + > +void > +foo (__vector_pair *dst, double *x) > +{ > + dst[0] = __builtin_vsx_
Re: [PATCH] Disable float128 tests on VxWorks, PR target/104253.
On Thu, 2022-04-07 at 06:00 -0500, Segher Boessenkool wrote: > On Thu, Apr 07, 2022 at 12:29:45AM -0400, Michael Meissner wrote: > > In PR target/104253, it was pointed out the that test case added as part > > of fixing the PR does not work on VxWorks because float128 is not > > supported on that system. I have modified the three tests for float128 so > > that they are manually excluded on VxWorks systems. In looking at the > > code, I also added checks in check_effective_target_ppc_ieee128_ok to > > disable the systems that will never support VSX instructions which are > > required for float128 support (eabi, eabispe, darwin). > > It's just one extra to the big list here, but, why do we need all these > manual exclusions anyway? What is broken about the test itself? >From the PR, it looks like this test noted an error, not actually a failure. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104253#c17 cc1: warning: The '-mfloat128' option may not be fully supported which comes out of gcc/config/rs6000/rs6000.cc rs6000_option_override_internal() via /* IEEE 128-bit floating point requires VSX support. */ if (TARGET_FLOAT128_KEYWORD) { if (!TARGET_VSX) { } else if (!TARGET_FLOAT128_TYPE) { TARGET_FLOAT128_TYPE = 1; warning (0, "The %<-mfloat128%> option may not be fully supported"); } } > > It would be so much more useful if the tests would help us, instead of > producing a lot of extra busy-work. > > > Segher
Re: [PATCH] rs6000/test: Adjust p9-vec-length-7 sensitive to unroll [PR103196]
On Mon, 2022-02-28 at 13:37 +0800, Kewen.Lin via Gcc-patches wrote: > Hi, > > As PR103196 shows, p9-vec-length-full-7.c needs to be adjusted as the > complete unrolling can happen on some of its loops. This patch is to > use pragma "GCC unroll 0" to disable all possible loop unrollings. > Hope it can help the case not that fragile. ok Is the lack of effectiveness of "-fno-unroll-loops" otherwise understood, or is there further issue behind that option? I would expect the effect of the option, versus the pragma, two to roughly equivalent. Obviously it is not. :-) > > There are some other p9-vec-length* cases, I noticed that some of them > use either bigger or unknown loop iteration counts, and > "p9-vec-length-3*" have considered the effects of complete unrolling. > So I just leave them alone for now. > > Tested on powerpc64-linux-gnu P8 and powerpc64le-linux-gnu P9 and P10. > > Is it ok for trunk? > > BR, > Kewen > - > PR testsuite/103196 > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/p9-vec-length-7.h: Add DO_PRAGMA macro. > * gcc.target/powerpc/p9-vec-length-epil-7.c: Use unroll pragma to > disable any unrollings. > * gcc.target/powerpc/p9-vec-length-full-7.c: Remove useless option. > * gcc.target/powerpc/p9-vec-length.h: Likewise. I suggest a slight rearrangement and correction. The -fno-unroll-loops options are removed from *-epil-7.c and *-full-7.c. p9-vec-length.h adds the DO_PRAGMA macro. p9-vec-length-7.h updates (corrects?) whitespace and adds the PRAGMA call for "GCC unroll 0" around the test loop. > > --- > > .../gcc.target/powerpc/p9-vec-length-7.h| 17 +++-- > > .../gcc.target/powerpc/p9-vec-length-epil-7.c | 2 +- > > .../gcc.target/powerpc/p9-vec-length-full-7.c | 2 +- > > .../gcc.target/powerpc/p9-vec-length.h | 2 ++ > > 4 files changed, 15 insertions(+), 8 deletions(-) > > > > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h > > b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h > > index 4ef8f974a04..4f338565619 100644 > > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h > > +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h > > @@ -7,14 +7,19 @@ > > #define START 1 > > #define END 59 > > > > +/* Note that we use pragma unroll to disable any loop unrollings. */ > > + > > #define test(TYPE) > > \ > > - TYPE x_##TYPE[N] __attribute__((aligned(16))); > > \ > > - void __attribute__((noinline, noclone)) test_npeel_##TYPE() { > > \ > > + TYPE x_##TYPE[N] __attribute__ ((aligned (16))); > > \ > > + void __attribute__ ((noinline, noclone)) test_npeel_##TYPE () > > \ > > + { > > \ > > TYPE v = 0; > > \ > > -for (unsigned int i = START; i < END; i++) { > > \ > > - x_##TYPE[i] = v; > > \ > > - v += 1; > > \ > > -} > > \ > > +DO_PRAGMA (GCC unroll 0) > > \ > > +for (unsigned int i = START; i < END; i++) > > \ > > + { > > \ > > + x_##TYPE[i] = v; \ > > + v += 1;\ > > + } > > \ > >} Some whitespace fix-ups (ok), and the addition of the "DO_PRAGMA (GCC unroll 0)". ok. > > > > TEST_ALL (test) > > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c > > b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c > > index a27ee347ca1..859fedd5679 100644 > > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c > > +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c > > @@ -1,5 +1,5 @@ > > /* { dg-do compile { target { lp64 && powerpc_p9vector_ok } } } */ > > -/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize > > -fno-vect-cost-model -fno-unroll-loops -ffast-math" } */ > > +/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize > > -fno-vect-cost-model -ffast-math" } */ ok > > > > /* { dg-additional-options "--param=vect-partial-vector-usage=1" } */ > > > > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c > > b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c > > index 89ff38443e7..5fe542bba20 100644 > > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.
Re: [PATCH, rs6000] Correct match pattern in pr56605.c
On Mon, 2022-02-28 at 11:17 +0800, HAO CHEN GUI via Gcc-patches wrote: > Hi, > This patch corrects the match pattern in pr56605.c. The former pattern > is wrong and test case fails with GCC11. It should match following insn on > each subtarget after mode promotion is disabled. The patch need to be > backported to GCC11. > Hi, I note This patch appears to (partially?) address the P1 [11 regression] pr. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102146 The issue makes reference to a different proposed patch in issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103197 titled ppc inline expansion of memcpy/memmove should not use lxsibzx/stxsibx for a single byte proposed patch named rs6000: Disparage lfiwzx and similar I can't address any of the background or history there. :-) > //gimple > _17 = (unsigned int) _20; > prolog_loop_niters.4_23 = _17 & 3; > > //rtl > (insn 19 18 20 2 (parallel [ > (set (reg:CC 208) > (compare:CC (and:SI (subreg:SI (reg:DI 207) 0) > (const_int 3 [0x3])) > (const_int 0 [0]))) > (set (reg:SI 129 [ prolog_loop_niters.5 ]) > (and:SI (subreg:SI (reg:DI 207) 0) > (const_int 3 [0x3]))) > ]) 197 {*andsi3_imm_mask_dot2} > > > Bootstrapped and tested on powerpc64-linux BE/LE and AIX with no > regressions. > Is this okay for trunk and GCC11? Any recommendations? Thanks a lot. > > ChangeLog > 2022-02-28 Haochen Gui > > gcc/testsuite/ > PR target/102146 > * gcc.target/powerpc/pr56605.c: Correct match pattern in combine pass. > > > patch.diff > diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c > b/gcc/testsuite/gcc.target/powerpc/pr56605.c > index fdedbfc573d..231d808aa99 100644 > --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c > @@ -11,5 +11,5 @@ void foo (short* __restrict sb, int* __restrict ia) > ia[i] = (int) sb[i]; > } > > -/* { dg-final { scan-rtl-dump-times {\(compare:CC > \((?:and|zero_extend):(?:DI) \((?:sub)?reg:[SD]I} 1 "combine" } } */ > +/* { dg-final { scan-rtl-dump-times {\(compare:CC \(and:SI \(subreg:SI > \(reg:DI} 1 "combine" } } */ SO with the update, (i squint so this is an approximate handwave) this drops the zero_extend and changes the destination type to be DI for the scan-rtl.This appears to match the rtl as mentioned in the patch comments. >
Re: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059
On Tue, 2022-04-12 at 21:14 -0400, Michael Meissner wrote: > Eliminate power8 fusion options, use power8 tuning, PR target/102059 > > This is V4 of the patch. Compared to V3 of the patch, GCC will just > ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign. > Hi, No comments on code, a few comments about the comments below. > The splitting of signed halfword and word loads into unsigned load and > sign extension is now suppressed with -Os, but it is done normally if we > are not optimizing for space. I see references to TARGET_P8_FUSION_SIGN in the patch below, and some removal of old code. I assume this describes the implementation that remains. > > The power8 fusion support used to be set automatically when -mcpu=power8 or > -mtune=power8 was used, and it was cleared for other cpu's. However, if you > used the target attribute or target #pragma to change the default cpu type or > tuning, you would get an error that a target specifiction option mismatch > occurred. specification. :-) > > This occurred because the rs6000_can_inline_p function just compares the ISA > bits between the called inline function and the caller. If the ISA flags of > the called function is not a subset of the ISA flags of the caller, we won't > do > the inlinging. When a power9 or power10 function inlines a function that is > explicitly compiled for power8, the power8 function has the power8 fusion bits > set and the power9 or power10 functions do not have the fusion bits set. inlining. > > This code makes the -mpower8-fusion option a nop. It is accepted without > warning, but it does nothing. Power8 fusion is only enabled if we are tuning > for a power8. > > The undocumented -mpower8-fusion-sign option is also made into a nop. > > I left in the pragma target and attribute target support for power8-fusion, > but > using it doesn't do anything now. This is because I told the customer who > encountered this problem that one solution was to add an explicit > no-power8-fusion option in their target pragma or attribute to work around the > problem. > > I have tested this patch on a little endian power10 system. I have tested > previous versions on little endian power9 and big endian power8 systems. > Can I apply this patch to the master branch? > > If it is accepted, I will produce a similar patch for back porting to GCC 11 > and GCC 10. > > 2022-04-12 Michael Meissner > > gcc/ > PR target/102059 > * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete. > (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks. > (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION. > * config/rs6000/rs6000.cc (rs6000_option_override_internal): > Delete code that set the power8 fusion options automatically. > (rs6000_opt_masks): Allow #pragma target and attribute target > power8-fusion option for backwards compatibility. > (rs6000_print_options_internal): Skip printing backward > compatibility options that are just ignored. > * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro. > (TARGET_P8_FUSION_SIGN): Likewise. > (MASK_P8_FUSION): Delete. > * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but > ignore it completely. > (-mpower8-fusion-sign): Likewise. > * doc/invoke.texi (RS/6000 and PowerPC Options): Delete > -mpower8-fusion. > > gcc/testsuite/ > PR target/102059 > * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion. > * gcc.dg/lto/pr102059-2_0.c: Likewise. > * gcc.target/powerpc/pr102059-3.c: Likewise. > * gcc.target/powerpc/pr102059-4.c: New test. > --- > gcc/config/rs6000/rs6000-cpus.def | 18 +++ > gcc/config/rs6000/rs6000.cc | 49 +-- > gcc/config/rs6000/rs6000.h| 13 - > gcc/config/rs6000/rs6000.opt | 8 +-- > gcc/doc/invoke.texi | 13 + > gcc/testsuite/gcc.dg/lto/pr102059-1_0.c | 2 +- > gcc/testsuite/gcc.dg/lto/pr102059-2_0.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr102059-3.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 + > 9 files changed, 62 insertions(+), 68 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c > > diff --git a/gcc/config/rs6000/rs6000-cpus.def > b/gcc/config/rs6000/rs6000-cpus.def > index 963947f6939..d913a3d6b73 100644 > --- a/gcc/config/rs6000/rs6000-cpus.def > +++ b/gcc/config/rs6000/rs6000-cpus.def > @@ -54,19 +54,14 @@ >| OPTION_MASK_QUAD_MEMORY \ >| OPTION_MASK_QUAD_MEMORY_ATOMIC) > > -/* ISA masks setting fusion options. */ > -#define OTHER_FUSION_MASKS (OPTION_MASK_P8_FUSION \ > - | OPTION_MASK_P8_FUSION_SIGN) > - > /* Add ISEL back into ISA 3.0, since it is supposed to be a win. Do not
Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)
Ping. On Mon, 2022-09-19 at 11:13 -0500, will schmidt wrote: > [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] > > Hi, > The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE, > and can be disabled by dependent options when it should not be. > This manifests in the issue seen in PR101865 where -mno-vsx > mistakenly disables _ARCH_PWR8. > > This change replaces the relevant TARGET_DIRECT_MOVE references > with a TARGET_POWER8 entry so that the direct_move and power8 > features can be enabled or disabled independently. > > This is done via the OPTION_MASK definitions, so this > means that some references to the OPTION_MASK_DIRECT_MOVE > option are now replaced with OPTION_MASK_POWER8. > > The existing (and rather lengthy) commentary for DIRECT_MOVE remains > in place in rs6000-c.cc:rs6000_target_modify_macros(). The > if-defined logic there will now set a __DIRECT_MOVE__ define when > TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug > purposes, but is otherwise unused. This can be removed in a > subsequent patch, or in an update of this patch, depending on feedback. > > This regests cleanly (power8,power9,power10), and resolves > PR 101865 as represented in the tests from (1/2). > > OK for trunk? > Thanks, > -Will > > > gcc/ > PR Target/101865 > * config/rs6000/rs6000-builtin.cc > (rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE > usage with TARGET_POWER8. > * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): > Add __DIRECT_MOVE__ define. Replace _ARCH_PWR8_ define > conditional with OPTION_MASK_POWER8. > * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): > Add OPTION_MASK_POWER8 entry. > (POWERPC_MASKS): Same. > * config/rs6000/rs6000.cc (rs6000_option_override_internal): > Replace OPTION_MASK_DIRECT_MOVE usage with OPTION_MASK_POWER8. > (rs6000_opt_masks): Add "power8" entry for new OPTION_MASK_POWER8. > * config/rs6000/rs6000.opt (-mpower8): Add entry for POWER8. > * config/rs6000/vsx.md (vsx_extract_): Replace > TARGET_DIRECT_MOVE usage with TARGET_POWER8. > (define_peephole2): Same. > > diff --git a/gcc/config/rs6000/rs6000-builtin.cc > b/gcc/config/rs6000/rs6000-builtin.cc > index 3ce729c1e6de..91a0f39bd796 100644 > --- a/gcc/config/rs6000/rs6000-builtin.cc > +++ b/gcc/config/rs6000/rs6000-builtin.cc > @@ -163,11 +163,11 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins > fncode) > case ENB_P7: >return TARGET_POPCNTD; > case ENB_P7_64: >return TARGET_POPCNTD && TARGET_POWERPC64; > case ENB_P8: > - return TARGET_DIRECT_MOVE; > + return TARGET_POWER8; > case ENB_P8V: >return TARGET_P8_VECTOR; > case ENB_P9: >return TARGET_MODULO; > case ENB_P9_64: > diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc > index ca9cc42028f7..41d51b039061 100644 > --- a/gcc/config/rs6000/rs6000-c.cc > +++ b/gcc/config/rs6000/rs6000-c.cc > @@ -439,11 +439,13 @@ rs6000_target_modify_macros (bool define_p, > HOST_WIDE_INT flags) > turned off in any of the following conditions: > 1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly > disabled and OPTION_MASK_DIRECT_MOVE was not explicitly > enabled. > 2. TARGET_VSX is off. */ > - if ((flags & OPTION_MASK_DIRECT_MOVE) != 0) > + if ((OPTION_MASK_DIRECT_MOVE) != 0) > +rs6000_define_or_undefine_macro (define_p, "__DIRECT_MOVE__"); > + if ((flags & OPTION_MASK_POWER8) != 0) > rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8"); >if ((flags & OPTION_MASK_MODULO) != 0) > rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9"); >if ((flags & OPTION_MASK_POWER10) != 0) > rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10"); > diff --git a/gcc/config/rs6000/rs6000-cpus.def > b/gcc/config/rs6000/rs6000-cpus.def > index c3825bcccd84..c873f6d58989 100644 > --- a/gcc/config/rs6000/rs6000-cpus.def > +++ b/gcc/config/rs6000/rs6000-cpus.def > @@ -48,10 +48,11 @@ > system. */ > #define ISA_2_7_MASKS_SERVER (ISA_2_6_MASKS_SERVER \ >| OPTION_MASK_P8_VECTOR\ >| OPTION_MASK_CRYPTO \ >| OPTION_MASK_DIRECT_MOVE \ > + | OPTION_MASK_POWER8 \ >| OPTION_MASK_EFFICIENT_UNALIGNED_VSX \ >| OPTION_MASK_QUAD_MEMORY \ >| OPTION_MASK_QUAD_MEMORY_ATOMIC) > > /* ISA masks setting fusion options. */ > @@ -124,10 +125,11 @@ > #define POWERPC_MASKS(OPTION_MASK_ALTIVEC > \ >| OPTION_MASK_CMPB \ >| OPTION_MASK_C
Re: [PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. (1/2)
On Mon, 2022-10-17 at 10:32 -0500, Segher Boessenkool wrote: > Hi! > > Everything Ke Wen said. Some more commments / hints: Thanks for the reviews. :-) I'll rework things and repost 'soon'. Thanks -WIll
Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)
On Mon, 2022-10-17 at 13:08 -0500, Segher Boessenkool wrote: > On Mon, Sep 19, 2022 at 11:13:20AM -0500, will schmidt wrote: > > The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE, > > and can be disabled by dependent options when it should not be. > > This manifests in the issue seen in PR101865 where -mno-vsx > > mistakenly disables _ARCH_PWR8. > > This change replaces the relevant TARGET_DIRECT_MOVE references > > with a TARGET_POWER8 entry so that the direct_move and power8 > > features can be enabled or disabled independently. > > We should get rid of TARGET_DIRECT_MOVE altogether. Please see > 57f108f5a1e1: > rs6000: Disable -m[no-]direct-move (PR85293) > > The -mno-direct-move option causes a lot of problems, since it > forces > us to be able to generate code for p8 and up with some crucial > instructions missing. This patch removes the -m[no-]direct-move > options so that the user cannot put us into this unexpected > situation > anymore. Internally we still have all the same flags, and they > are > automatically set based on -mcpu; getting rid of that is a lot > more > work and will have to wait for GCC 9 (in some places the flag is > used > to see if we are compiling for a p8 _at all_). > > It did not happen in GCC 9 obviously. Do you want to take a > shot? It > doesn't have to be all at once, it's probably best if not even -- as > I > wrote in the commit message, the flag always was used to mean > different > things. As long as it's OK to be removed, I'll certainly take a shot at it. With that in mind that may simplify things for me here. I expect that anything currently guarded by DIRECT_MOVE should instead be guarded by POWER8. > > > The existing (and rather lengthy) commentary for DIRECT_MOVE > > remains > > in place in rs6000-c.cc:rs6000_target_modify_macros(). The > > if-defined logic there will now set a __DIRECT_MOVE__ define when > > TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug > > purposes, but is otherwise unused. This can be removed in a > > subsequent patch, or in an update of this patch, depending on > > feedback. > > There should be no such macro, for the same reason there should be no > -mdirect-move option: it is so very essential to all code we > generate, > it *always* is enabled if we have P8 or later. fair enough. > > > gcc/ > > PR Target/101865 > > * config/rs6000/rs6000-builtin.cc > > (rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE > > usage with TARGET_POWER8. > > Please don't arbitrarily wrap lines. It is harder to read, and it > looks > like something is missing. > > > * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): > > Add OPTION_MASK_POWER8 entry. > > Especially in cases like this, where it looks like you forgot to > write > something after the colon. > > > @@ -24046,10 +24045,11 @@ static struct rs6000_opt_mask const > > rs6000_opt_masks[] = > >{ "block-ops-vector-pair", OPTION_MASK_BLOCK_OPS_VECTOR_PA > > IR, > > false, > > true }, > >{ "cmpb",OPTION_MASK_CMPB, fal > > se, true }, > >{ "crypto", OPTION_MASK_CRYPTO, fal > > se, true }, > >{ "direct-move", OPTION_MASK_DIRECT_MOVE,false, > > true }, > > + { "power8", OPTION_MASK_POWER8, fal > > se, true }, > > Why would we want a #pragma power8 ? Hmm, thinko on my part, i'll reevaluate. > > > --- a/gcc/config/rs6000/rs6000.opt > > +++ b/gcc/config/rs6000/rs6000.opt > > @@ -490,10 +490,15 @@ mcrypto > > Target Mask(CRYPTO) Var(rs6000_isa_flags) > > Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 > > instructions. > > > > mdirect-move > > Target Undocumented Mask(DIRECT_MOVE) Var(rs6000_isa_flags) > > WarnRemoved > > +Enable direct move (ISA 2.07). > > It is undocumented and should remain that, except eventually we > should > remove it completely (but leave some stubs so that code in the wild > keeps compiling). > > > +mpower8 > > +Target Mask(POWER8) Var(rs6000_isa_flags) > > +Use instructions added in ISA 2.07 (power8). > > There should not be such an option. It is set by -mcpu=power8 and > later, but can never be enabled or disabled direfctly by the user. OK. Thanks for the detailed review. :-) -Will > > > --- a/gcc/config/rs6000/vsx.md > > +++ b/gcc/config/rs6000/vsx.md > > @@ -3407,11 +3407,11 @@ (define_insn "vsx_extract_" > >if (element == VECTOR_ELEMENT_SCALAR_64BIT) > > { > >if (op0_regno == op1_regno) > > return ASM_COMMENT_START " vec_extract to same register"; > > > > - else if (INT_REGNO_P (op0_regno) && TARGET_DIRECT_MOVE > > + else if (INT_REGNO_P (op0_regno) && TARGET_POWER8 > >&& TARGET_POWERPC64) > > That fits on one line now. > > Thanks, > > > Segher
Re: [PATCH] Optimize vec_splats of constant V2DI/V2DF vec_extract, PR target/99293
On Fri, 2022-05-13 at 10:49 -0400, Michael Meissner wrote: > Optimize vec_splats of constant V2DI/V2DF vec_extract, PR target/99293. > > This patch has been previously posted, but it seemed to get lost.: > > > Date: Tue, 29 Mar 2022 23:25:31 -0400 > > Subject: [PATCH, V2] Optimize vec_splats of constant vec_extract for > > V2DI/V2DF, PR target 99293. > > Message-ID: > > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592509.html > > I had originally posted a previous version of this patch here. There were > changes asked for, which I did in this patch. > > > Date: Mon, 28 Mar 2022 12:26:02 -0400 > > Subject: [PATCH 1/4] Optimize vec_splats of constant vec_extract for > > V2DI/V2DF, PR target 99293. > > Message-ID: > > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592420.html > Hi, generally lgtm. A few typos and comment suggests called out below. Thanks. -Will > In PR target/99293, it was pointed out that doing: > > vector long long dest0, dest1, src; > /* ... */ > dest0 = vec_splats (vec_extract (src, 0)); > dest1 = vec_splats (vec_extract (src, 1)); > > would generate slower code. > > It generates the following code on power8: > > ;; vec_splats (vec_extract (src, 0)) > xxpermdi 0,34,34,3 > xxpermdi 34,0,0,0 > > ;; vec_splats (vec_extract (src, 1)) > xxlor 0,34,34 > xxpermdi 34,0,0,0 > > However on power9 and power10 it generates: > > ;; vec_splats (vec_extract (src, 0)) > mfvsld 3,34 > mtvsrdd 34,9,9 > > ;; vec_splats (vec_extract (src, 1)) > mfvsrd 9,34 > mtvsrdd 34,9,9 > > This is due to the power9 having the mfvsrld instruction which can extract > either 64-bit element into a GPR. While there are alternatives for both > vector registers and GPR registers, the register allocator prefers to put > DImode into GPR registers. > > However in this case, it is better to have a single combiner pattern that > can generate a single xxpermdi, instead of doing 2 insnsns (the extract I like the idea of insnsns being the plural of insn. > and then the concat). This is particularly true if the two operations are > move from vector register and move to vector register. As Segher pointed > out in a previous version of the patch, the combiner already tries doing s/doing// > creating a (vec_duplicate (vec_select ...)) pattern, but we didn't provide > one. > > This patch reworks vsx_xxspltd_ for V2DImode and V2DFmode so that it > no longer uses an UNSPEC. Instead it uses VEC_DUPLICATE, which the > combiner checks for. potentially s/no longer uses an UNSPEC. Instead it uses/now uses/ and possibly s/ch ecks for/can find/ > > I have built Spec 2017 with this patch installed, and the cam4_r benchmark > is the only benchmark that generated different code (3 mfvsrld/mtvsrdd > pairs of instructions were replaced with xxpermdi). > > I have built bootstrap versions on the following systems and I have run > the regression tests. There were no regressions in the runs: > > Power9 little endian, --with-cpu=power9 > Power10 little endian, --with-cpu=power10 > Power8 big endian, --with-cpu=power8 (both 32-bit & 64-bit tests) > > Can I install this into the trunk? After a burn-in period, can I backport > and install this into GCC 12, GCC 11 and GCC 10 branches? > > 2022-05-13 Michael Meissner > > gcc/ > PR target/99293 > * config/rs6000/rs6000-p8swap.cc (rtx_is_swappable_p): Remove > UNSPEC_VSX_XXSPLTD case. > * config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTD): Delete. > (vsx_xxspltd_): Rewrite to use VEC_DUPLICATE. > > gcc/testsuite: > PR target/99293 > * gcc.target/powerpc/builtins-1.c: Update insn count. > * gcc.target/powerpc/pr99293.c: New test. > --- > gcc/config/rs6000/rs6000-p8swap.cc| 1 - > gcc/config/rs6000/vsx.md | 19 +- > gcc/testsuite/gcc.target/powerpc/builtins-1.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr99293.c| 36 +++ > 4 files changed, 47 insertions(+), 11 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c > > diff --git a/gcc/config/rs6000/rs6000-p8swap.cc > b/gcc/config/rs6000/rs6000-p8swap.cc > index d301bc3fe59..1973d9c8245 100644 > --- a/gcc/config/rs6000/rs6000-p8swap.cc > +++ b/gcc/config/rs6000/rs6000-p8swap.cc > @@ -805,7 +805,6 @@ rtx_is_swappable_p (rtx op, unsigned int *special) > case UNSPEC_VUPKLU_V4SF: > return 0; > case UNSPEC_VSPLT_DIRECT: > - case UNSPEC_VSX_XXSPLTD: > *special = SH_SPLAT; > return 1; > case UNSPEC_REDUC_PLUS: ok > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 1b75538f42f..a1a1ce95195 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -296,7 +296,6 @@ (define_c_enum "unspec" > UNSPEC_VSX_XXPERM > > UNSPEC_VSX_XXSPLTW > - UNSPEC_VSX_XXSPL
Re: [PATCH] Replace UNSPEC with RTL code for extendditi2.
On Fri, 2022-05-13 at 10:52 -0400, Michael Meissner wrote: > Replace UNSPEC with RTL code for extendditi2. > Hi, > When I submitted my patch on March 12th for extendditi2, Segher > wished I > had removed the use of the UNSPEC for the vextsd2q instruction. This > patch rewrites extendditi2_vector to use VEC_SELECT rather than > UNSPEC. I'd suggest a paragraph break between the two sentences. > > 2022-05-13 Michael Meissner > > gcc/ > * config/rs6000/vsx.md (UNSPEC_EXTENDDITI2): Delete. > (extendditi2_vector): Rewrite to use VEC_SELECT as a > define_expand. > (extendditi2_vector2): New insn. Ok, so per my interpretation of the patch below, it converts the define_insn extendditi2_vector into a define_expand, and creates a new extendditi2_vector2 instruction. Content below seems reasonable, I've not reviewed it extensively. Thanks -Will > --- > gcc/config/rs6000/vsx.md | 22 ++ > 1 file changed, 18 insertions(+), 4 deletions(-) > > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index a1a1ce95195..c091e5e2f47 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -358,7 +358,6 @@ (define_c_enum "unspec" > UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX > UNSPEC_XXGENPCV > UNSPEC_MTVSBM > - UNSPEC_EXTENDDITI2 > UNSPEC_VCNTMB > UNSPEC_VEXPAND > UNSPEC_VEXTRACT > @@ -5083,10 +5082,25 @@ (define_insn_and_split "extendditi2" > (set_attr "type" "shift,load,vecmove,vecperm,load")]) > > ;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in > TI reg > -(define_insn "extendditi2_vector" > +(define_expand "extendditi2_vector" > + [(use (match_operand:TI 0 "gpc_reg_operand")) > + (use (match_operand:TI 1 "gpc_reg_operand"))] > + "TARGET_POWER10" > +{ > + rtx dest = operands[0]; > + rtx src_v2di = gen_lowpart (V2DImode, operands[1]); > + rtx element = GEN_INT (VECTOR_ELEMENT_SCALAR_64BIT); > + > + emit_insn (gen_extendditi2_vector2 (dest, src_v2di, element)); > + DONE; > +}) > + > +(define_insn "extendditi2_vector2" >[(set (match_operand:TI 0 "gpc_reg_operand" "=v") > - (unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")] > - UNSPEC_EXTENDDITI2))] > + (sign_extend:TI > + (vec_select:DI > + (match_operand:V2DI 1 "gpc_reg_operand" "v") > + (parallel [(match_operand 2 "vsx_scalar_64bit" "wD")]] >"TARGET_POWER10" >"vextsd2q %0,%1" >[(set_attr "type" "vecexts")]) > -- > 2.35.3 > >
Re: [PATCH] Delay splitting addti3/subti3 until first split pass.
On Fri, 2022-05-13 at 11:08 -0400, Michael Meissner wrote: > Add zero_extendditi2. Improve lxvr*x code generation. > Hi, > Subject: Re: [PATCH] Delay splitting addti3/subti3 until first split pass. Subject does not seem to match contents? > This pattern adds zero_extendditi2 so that if we are extending DImode that > is in a GPR register to TImode in a vector register, the compiler can > generate MTVSRDDD. Just "mtvsrdd". > > In addition the patterns for generating lxvr{b,h,w,d}x were tuned to allow > loading to gpr registers. This prevents needlessly doing direct moves to > get the value into the vector registers if the gpr register was already > selected. > > In updating the insn counts for two tests due to these changes, I noticed > the tests were done at -O0. I changed this so that the tests are now done > at the normal -O2 optimization level. > s/normal/default/ ? > This patch will be needed for an upcoming patch for PR target/103109. > ok > I have built this patch on little endian power10, little endian power9, > and big endian power8 systems. There were no regressions with this > patch. Can I install this on the GCC 13 trunk? > > 2022-05-013 Michael Meissner > > gcc/ > * config/rs6000/vsx.md (vsx_lxvrx): Add support for loading to > GPR registers. > (vsx_stxvrx): Add support for storing from GPR registers. swap the froms and tos ? > (zero_extendditi2): New insn. > > gcc/testsuite/ > * gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2 > instead of -O0 and update insn counts. > * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise. > * gcc.target/powerpc/zero-extend-di-ti.c: New test. > --- > gcc/config/rs6000/vsx.md | 82 +-- > .../powerpc/vsx-load-element-extend-int.c | 36 > .../powerpc/vsx-load-element-extend-short.c | 35 > .../gcc.target/powerpc/zero-extend-di-ti.c| 62 ++ > 4 files changed, 164 insertions(+), 51 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c > > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index c091e5e2f47..ad971e3a1de 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -1315,14 +1315,32 @@ (define_expand "vsx_store_" > } > }) > > -;; Load rightmost element from load_data > -;; using lxvrbx, lxvrhx, lxvrwx, lxvrdx. > -(define_insn "vsx_lxvrx" > - [(set (match_operand:TI 0 "vsx_register_operand" "=wa") > - (zero_extend:TI (match_operand:INT_ISA3 1 "memory_operand" "Z")))] > - "TARGET_POWER10" > - "lxvrx %x0,%y1" > - [(set_attr "type" "vecload")]) > +;; Load rightmost element from load_data using lxvrbx, lxvrhx, lxvrwx, > lxvrdx. > +;; Support TImode being in a GPR register to prevent generating lvxr{d,w,b}x > +;; and then two direct moves if we ultimately need the value in a GPR > register. Perhaps break into two sentences and split the description of what is prevented in a separate sentence. ? > +(define_insn_and_split "vsx_lxvrx" > + [(set (match_operand:TI 0 "register_operand" "=r,wa") > + (zero_extend:TI (match_operand:INT_ISA3 1 "memory_operand" "m,Z")))] > + "TARGET_POWERPC64 && TARGET_POWER10" > + "@ > + # > + lxvrx %x0,%y1" > + "&& reload_completed && int_reg_operand (operands[0], TImode)" > + [(set (match_dup 2) (match_dup 3)) > + (set (match_dup 4) (const_int 0))] > +{ > + rtx op0 = operands[0]; > + rtx op1 = operands[1]; > + > + operands[2] = gen_lowpart (DImode, op0); > + operands[3] = (mode == DImode > + ? op1 > + : gen_rtx_ZERO_EXTEND (DImode, op1)); > + > + operands[4] = gen_highpart (DImode, op0); > +} > + [(set_attr "type" "load,vecload") > + (set_attr "num_insns" "2,*")]) > > ;; Store rightmost element into store_data > ;; using stxvrbx, stxvrhx, strvxwx, strvxdx. > @@ -5019,6 +5037,54 @@ (define_expand "vsignextend_si_v2di" >DONE; > }) > > +;; Zero extend DI to TI. If we don't have the MTVSRDD instruction (and > LXVRDX > +;; in the case of power10), we use the machine independent code. If we are > +;; loading up GPRs, we fall back to the old code. Will 'old code' have meaning to future readers of this lump of code? > +(define_insn_and_split "zero_extendditi2" > + [(set (match_operand:TI 0 "register_operand" "=r,r, > wa,&wa") > + (zero_extend:TI (match_operand:DI 1 "register_operand" "r,wa,r, > wa")))] > + "TARGET_POWERPC64 && TARGET_P9_VECTOR" > + "@ > + # > + # > + mtvsrdd %x0,0,%1 > + #" > + "&& reload_completed > + && (int_reg_operand (operands[0], TImode) > + || vsx_register_operand (operands[1], DImode))" > + [(pc)] > +{ > + rtx dest = operands[0]; > + rtx src = operands[1]; > + int dest_regno = reg_or_subregno (dest); > + > + /* Handle conversion to GPR registers. Load up the low part and then do > + zero out the upper part. */ s/do// > + if (IN
Re: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation.
On Fri, 2022-05-13 at 12:13 -0400, Michael Meissner wrote: > Add zero_extendditi2. Improve lxvr*x code generation. > Content here matches what I commented on in the prior email with subject "Delay splitting addti3...". > This pattern adds zero_extendditi2 so that if we are extending DImode > that > is in a GPR register to TImode in a vector register, the compiler can > generate MTVSRDDD. > > In addition the patterns for generating lxvr{b,h,w,d}x were tuned to > allow > loading to gpr registers. This prevents needlessly doing direct > moves to > get the value into the vector registers if the gpr register was > already > selected. > > In updating the insn counts for two tests due to these changes, I > noticed > the tests were done at -O0. I changed this so that the tests are now > done > at the normal -O2 optimization level. > > This patch will be needed for an upcoming patch for PR target/103109. > > I have built this patch on little endian power10, little endian > power9, > and big endian power8 systems. There were no regressions with this > patch. Can I install this on the GCC 13 trunk? > > 2022-05-013 Michael Meissner > > gcc/ > * config/rs6000/vsx.md (vsx_lxvrx): Add support for loading > to > GPR registers. > (vsx_stxvrx): Add support for storing from GPR registers. > (zero_extendditi2): New insn. > > gcc/testsuite/ > * gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2 > instead of -O0 and update insn counts. > * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise. > * gcc.target/powerpc/zero-extend-di-ti.c: New test. > --- > gcc/config/rs6000/vsx.md | 82 > +-- > .../powerpc/vsx-load-element-extend-int.c | 36 > .../powerpc/vsx-load-element-extend-short.c | 35 > .../gcc.target/powerpc/zero-extend-di-ti.c| 62 ++ > 4 files changed, 164 insertions(+), 51 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/zero-extend-di- > ti.c > > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index c091e5e2f47..ad971e3a1de 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -1315,14 +1315,32 @@ (define_expand "vsx_store_" > } > }) > > -;; Load rightmost element from load_data > -;; using lxvrbx, lxvrhx, lxvrwx, lxvrdx. > -(define_insn "vsx_lxvrx" > - [(set (match_operand:TI 0 "vsx_register_operand" "=wa") > - (zero_extend:TI (match_operand:INT_ISA3 1 "memory_operand" > "Z")))] > - "TARGET_POWER10" > - "lxvrx %x0,%y1" > - [(set_attr "type" "vecload")]) > +;; Load rightmost element from load_data using lxvrbx, lxvrhx, > lxvrwx, lxvrdx. > +;; Support TImode being in a GPR register to prevent generating > lvxr{d,w,b}x > +;; and then two direct moves if we ultimately need the value in a > GPR register. > +(define_insn_and_split "vsx_lxvrx" > + [(set (match_operand:TI 0 "register_operand" "=r,wa") > + (zero_extend:TI (match_operand:INT_ISA3 1 "memory_operand" > "m,Z")))] > + "TARGET_POWERPC64 && TARGET_POWER10" > + "@ > + # > + lxvrx %x0,%y1" > + "&& reload_completed && int_reg_operand (operands[0], TImode)" > + [(set (match_dup 2) (match_dup 3)) > + (set (match_dup 4) (const_int 0))] > +{ > + rtx op0 = operands[0]; > + rtx op1 = operands[1]; > + > + operands[2] = gen_lowpart (DImode, op0); > + operands[3] = (mode == DImode > + ? op1 > + : gen_rtx_ZERO_EXTEND (DImode, op1)); > + > + operands[4] = gen_highpart (DImode, op0); > +} > + [(set_attr "type" "load,vecload") > + (set_attr "num_insns" "2,*")]) > > ;; Store rightmost element into store_data > ;; using stxvrbx, stxvrhx, strvxwx, strvxdx. > @@ -5019,6 +5037,54 @@ (define_expand "vsignextend_si_v2di" >DONE; > }) > > +;; Zero extend DI to TI. If we don't have the MTVSRDD instruction > (and LXVRDX > +;; in the case of power10), we use the machine independent code. If > we are > +;; loading up GPRs, we fall back to the old code. > +(define_insn_and_split "zero_extendditi2" > + [(set (match_operand:TI 0 > "register_operand" "=r,r, wa,&wa") > + (zero_extend:TI (match_operand:DI 1 > "register_operand" "r,wa,r, wa")))] > + "TARGET_POWERPC64 && TARGET_P9_VECTOR" > + "@ > + # > + # > + mtvsrdd %x0,0,%1 > + #" > + "&& reload_completed > + && (int_reg_operand (operands[0], TImode) > + || vsx_register_operand (operands[1], DImode))" > + [(pc)] > +{ > + rtx dest = operands[0]; > + rtx src = operands[1]; > + int dest_regno = reg_or_subregno (dest); > + > + /* Handle conversion to GPR registers. Load up the low part and > then do > + zero out the upper part. */ > + if (INT_REGNO_P (dest_regno)) > +{ > + rtx dest_hi = gen_highpart (DImode, dest); > + rtx dest_lo = gen_lowpart (DImode, dest); > + > + emit_move_insn (dest_lo, src); > + emit_move_insn (dest_hi, const0_rtx); > + DONE; > +} > + > + /
Re: [PATCH] Optimize multiply/add of DImode extended to TImode, PR target/103109.
On Fri, 2022-05-13 at 12:17 -0400, Michael Meissner wrote: > Optimize multiply/add of DImode extended to TImode, PR target/103109. > > On power9 and power10 systems, we have instructions that support doing > 64-bit integers converted to 128-bit integers and producing 128-bit > results. This patch adds support to generate these instructions. > > Previously GCC had define_expands to handle conversion of the 64-bit > extend to 128-bit and multiply. This patch changes these define_expands > to define_insn_and_split and then it provides combiner patterns to > generate thes multiply/add instructions. > > To support using this optimization on power9, this patch extends the sign > extend DImode to TImode to also run on power9 (added for PR > target/104698). > > This patch needs the previous patch to add unsigned DImode to TImode > conversion so that the combiner can combine the extend, multiply, and add > instructions. > > I have built this patch on little endian power10, little endian power9, and > big > endian power8 systems. There were no regressions when I ran it. Can I > install > this patch into the GCC 13 master branch? > > 2022-05-13 Michael Meissner > > gcc/ > PR target/103109 > * config/rs6000/rs6000.md (su_int32): New code attribute. > (mul3): Convert from define_expand to > define_insn_and_split. > (maddld4): Add generator function. -(define_insn "*maddld4" +(define_insn "maddld4" Is the removal of the "*" considering adding generator? (Thats terminology that I'm not immediately familiar with). > (mulditi3_adddi3): New insn. > (mulditi3_add_const): New insn. > (mulditi3_adddi3_upper): New insn. > > gcc/testsuite/ > PR target/103109 > * gcc.target/powerpc/pr103109.c: New test. ok > --- > gcc/config/rs6000/rs6000.md | 128 +++- > gcc/testsuite/gcc.target/powerpc/pr103109.c | 62 ++ > 2 files changed, 184 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103109.c > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 2aba70393d8..83eacec57ba 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -667,6 +667,9 @@ (define_code_attr uns [(fix "") > (float "") > (unsigned_float "uns")]) > > +(define_code_attr su_int32 [(sign_extend "s32bit_cint_operand") > + (zero_extend "c32bit_cint_operand")]) > + > ; Various instructions that come in SI and DI forms. > ; A generic w/d attribute, for things like cmpw/cmpd. > (define_mode_attr wd [(QI"b") > @@ -3190,13 +3193,16 @@ (define_insn "mulsi3_highpart_64" >"mulhw %0,%1,%2" >[(set_attr "type" "mul")]) > > -(define_expand "mul3" > - [(set (match_operand: 0 "gpc_reg_operand") > +(define_insn_and_split "mul3" > + [(set (match_operand: 0 "gpc_reg_operand" "=&r") > (mult: (any_extend: > - (match_operand:GPR 1 "gpc_reg_operand")) > +(match_operand:GPR 1 "gpc_reg_operand" "r")) > (any_extend: > - (match_operand:GPR 2 "gpc_reg_operand"] > +(match_operand:GPR 2 "gpc_reg_operand" "r"] >"!(mode == SImode && TARGET_POWERPC64)" > + "#" > + "&& 1" > + [(pc)] > { >rtx l = gen_reg_rtx (mode); >rtx h = gen_reg_rtx (mode); > @@ -3205,9 +3211,10 @@ (define_expand "mul3" >emit_move_insn (gen_lowpart (mode, operands[0]), l); >emit_move_insn (gen_highpart (mode, operands[0]), h); >DONE; > -}) > +} > + [(set_attr "length" "8")]) > ok > -(define_insn "*maddld4" > +(define_insn "maddld4" >[(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > (plus:GPR (mult:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") > (match_operand:GPR 2 "gpc_reg_operand" "r")) ok > @@ -3216,6 +3223,115 @@ (define_insn "*maddld4" >"maddld %0,%1,%2,%3" >[(set_attr "type" "mul")]) > > +(define_insn_and_split "*mulditi3_adddi3" > + [(set (match_operand:TI 0 "gpc_reg_operand" "=&r") > + (plus:TI > + (mult:TI > + (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r")) > + (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r"))) > + (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r"] > + "TARGET_MADDLD && TARGET_POWERPC64" > + "#" > + "&& 1" > + [(pc)] > +{ > + rtx dest = operands[0]; > + rtx dest_hi = gen_highpart (DImode, dest); > + rtx dest_lo = gen_lowpart (DImode, dest); > + rtx op1 = operands[1]; > + rtx op2 = operands[2]; > + rtx op3 = operands[3]; > + rtx tmp_hi, tmp_lo; > + > + if (can_create_pseudo_p ()) > +{ > + tmp_hi = gen_reg_rtx (DImode); > + tmp_lo = gen_reg_rtx (DImode); > +} > + else > +{ > + tmp_hi = dest_hi; > + tmp_lo = dest_lo; > +} > + > + emit_insn (gen_mulditi3_adddi3_upper (tmp_hi, op1, o
Re: [PATCH] Generate vadduqm and vsubuqm for TImode add/subtract
On Fri, 2022-05-13 at 12:19 -0400, Michael Meissner wrote: > Generate vadduqm and vsubuqm for TImode add/subtract > > If the TImode variable is in an Altivec register instead of a GPR > register, then generate vadduqm and vsubuqm instead of having to move the > value to the GPR registers and doing the add and subtract with carry > instructions. To do this, we have to delay the splitting of the addition > and subtraction until after register allocation. Ok. > > I have built this patch on little endian power10, little endian power9, and > big > endian power8 systems. There were no regressions. Can I install this patch > to > the GCC 13 master branch? > > 2022-05-13 Michael Meissner > > gcc/ > * config/rs6000/rs6000.md (addti3): Generate vadduqm if we are > using the Altivec registers. > (subti3): Generate vsubuqm if we using the Altivec registers. > (negti3): New insn. > > gcc/testsuite/ > * gcc.target/powerpc/vadduqm-vsubuqm.c: New test. > --- > gcc/config/rs6000/rs6000.md | 82 ++- > .../gcc.target/powerpc/vadduqm-vsubuqm.c | 22 + > 2 files changed, 83 insertions(+), 21 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/vadduqm-vsubuqm.c > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 83eacec57ba..f120ca0b48d 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -7139,15 +7139,22 @@ (define_expand "feraiseexceptsi" > ;; > ;; Addti3/subti3 are define_insn_and_splits instead of define_expand, to > allow > ;; for combine to make things like multiply and add with extend operations. > +;; > +;; Also add support in case the 128-bit integer happens to be an Altivec > +;; register. > > (define_insn_and_split "addti3" > - [(set (match_operand:TI 0 "gpc_reg_operand" "=&r,r,r") > - (plus:TI (match_operand:TI 1 "gpc_reg_operand" "r, 0,r") > - (match_operand:TI 2 "reg_or_short_operand" "rI,r,0"))) > + [(set (match_operand:TI 0 "gpc_reg_operand" "=&r, r,r,v") > + (plus:TI (match_operand:TI 1 "gpc_reg_operand" "r, 0,r,v") > + (match_operand:TI 2 "reg_or_short_operand" "rI,r,0,v"))) Nit.. I still can't tell of the "r, 0,r,v" should be comma-space, or comma delimited. Remainder looks OK. thanks -Will > (clobber (reg:DI CA_REGNO))] >"TARGET_64BIT" > - "#" > - "&& 1" > + "@ > + # > + # > + # > + vadduqm %0,%1,%2" > + "&& reload_completed && int_reg_operand (operands[0], TImode)" >[(pc)] > { >rtx lo0 = gen_lowpart (DImode, operands[0]); > @@ -7157,27 +7164,27 @@ (define_insn_and_split "addti3" >rtx hi1 = gen_highpart (DImode, operands[1]); >rtx hi2 = gen_highpart_mode (DImode, TImode, operands[2]); > > - if (!reg_or_short_operand (lo2, DImode)) > -lo2 = force_reg (DImode, lo2); > - if (!adde_operand (hi2, DImode)) > -hi2 = force_reg (DImode, hi2); > - >emit_insn (gen_adddi3_carry (lo0, lo1, lo2)); >emit_insn (gen_adddi3_carry_in (hi0, hi1, hi2)); >DONE; > } > - [(set_attr "length" "8") > + [(set_attr "length" "8,8,8,*") > + (set_attr "isa""*,*,*,p8v") > (set_attr "type" "add") > (set_attr "size" "128")]) > > (define_insn_and_split "subti3" > - [(set (match_operand:TI 0 "gpc_reg_operand""=&r,r,r") > - (minus:TI (match_operand:TI 1 "reg_or_short_operand" "rI,0,r") > - (match_operand:TI 2 "gpc_reg_operand" "r, r,0"))) > + [(set (match_operand:TI 0 "gpc_reg_operand""=&r, r,r,v") > + (minus:TI (match_operand:TI 1 "reg_or_short_operand" "rI,0,r,v") > + (match_operand:TI 2 "gpc_reg_operand" "r, r,0,v"))) > (clobber (reg:DI CA_REGNO))] >"TARGET_64BIT" > - "#" > - "&& 1" > + "@ > + # > + # > + # > + vsubuqm %0,%1,%2" > + "&& reload_completed && int_reg_operand (operands[0], TImode)" >[(pc)] > { >rtx lo0 = gen_lowpart (DImode, operands[0]); > @@ -7187,16 +7194,49 @@ (define_insn_and_split "subti3" >rtx hi1 = gen_highpart_mode (DImode, TImode, operands[1]); >rtx hi2 = gen_highpart (DImode, operands[2]); > > - if (!reg_or_short_operand (lo1, DImode)) > -lo1 = force_reg (DImode, lo1); > - if (!adde_operand (hi1, DImode)) > -hi1 = force_reg (DImode, hi1); > - >emit_insn (gen_subfdi3_carry (lo0, lo2, lo1)); >emit_insn (gen_subfdi3_carry_in (hi0, hi2, hi1)); >DONE; > +} > + [(set_attr "length" "8,8,8,*") > + (set_attr "isa""*,*,*,p8v") > + (set_attr "type" "add") > + (set_attr "size" "128")]) > + > +;; 128-bit integer negation, normally use GPRs. If we are using Altivec > +;; registers, create a 0 and do a vsubuqm. > +(define_insn_and_split "negti3" > + [(set (match_operand:TI 0 "gpc_reg_operand" "=&r,&v") > + (neg:TI (match_operand:TI 1 "gpc_reg_operand" "r,v"))) > + (clobber (reg:DI CA_REGNO))] > + "TARGET_64BIT" > + "#" >
[PATCH, rs6000] Remove the (no longer used) RS6000_BTC defines.
[PATCH, rs6000] Remove the (no longer used) RS6000_BTC defines. Hi, These defines are no longer used once the rs6000 built-in reworks were completed. Would be good to remove them. There was a reference to RS6000_BTC_SPECIAL in a TODO comment in rs6000-builtins.def. That comment remains, but I have updated the comment to refer to "SPECIAL" processing, instead of having it refer directly to the RS6000_BTC_SPECIAL macro. 2022-05-17 Will Schmidt gcc/ * config/rs6000/rs6000-builtins.def: rephrase RS6000_BTC_SPECIAL in comment. * config/rs6000/rs6000.h: Remove definitions RS6000_BTC_UNARY, RS6000_BTC_BINARY, RS6000_BTC_TERNARY, RS6000_BTC_QUATERNARY, RS6000_BTC_QUINARY, RS6000_BTC_SENARY, RS6000_BTC_OPND_MASK, RS6000_BTC_SPECIAL, RS6000_BTC_PREDICATE, RS6000_BTC_ABS, RS6000_BTC_DST, RS6000_BTC_TYPE_MASK, RS6000_BTC_MISC, RS6000_BTC_CONST, RS6000_BTC_PURE, RS6000_BTC_FP, RS6000_BTC_QUAD, RS6000_BTC_PAIR, RS6000_BTC_QUADPAIR, RS6000_BTC_ATTR_MASK, RS6000_BTC_SPR, RS6000_BTC_VOID, RS6000_BTC_CR, RS6000_BTC_OVERLOADED, RS6000_BTC_GIMPLE, RS6000_BTC_MISC_MASK, RS6000_BTC_MEM, RS6000_BTC_SAT, RS6000_BTM_ALWAYS diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index f4a9f24bcc5c..9a63a9eda580 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1423,11 +1423,11 @@ pure vsc __builtin_vsx_ld_elemrev_v16qi (signed long, const void *); LD_ELEMREV_V16QI vsx_ld_elemrev_v16qi {ldvec,endian} ; TODO: There is apparent intent in rs6000-builtin.def to have -; RS6000_BTC_SPECIAL processing for LXSDX, LXVDSX, and STXSDX, but there are +; SPECIAL processing for LXSDX, LXVDSX, and STXSDX, but there are ; no def_builtin calls for any of them. At some point, we may want to add a ; set of built-ins for whichever vector types make sense for these. pure vsq __builtin_vsx_lxvd2x_v1ti (signed long, const void *); LXVD2X_V1TI vsx_load_v1ti {ldvec} diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 523256a5c9d5..90a357ab7932 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -2247,58 +2247,10 @@ extern char rs6000_reg_names[][8]; /* register names (0 vs. %r0). */ /* #define MACHINE_no_sched_speculative_load */ /* General flags. */ extern int frame_pointer_needed; -/* Classification of the builtin functions as to which switches enable the - builtin, and what attributes it should have. We used to use the target - flags macros, but we've run out of bits, so we now map the options into new - settings used here. */ - -/* Builtin operand count. */ -#define RS6000_BTC_UNARY 0x0001 /* normal unary function. */ -#define RS6000_BTC_BINARY 0x0002 /* normal binary function. */ -#define RS6000_BTC_TERNARY 0x0003 /* normal ternary function. */ -#define RS6000_BTC_QUATERNARY 0x0004 /* normal quaternary - function. */ -#define RS6000_BTC_QUINARY 0x0005 /* normal quinary function. */ -#define RS6000_BTC_SENARY 0x0006 /* normal senary function. */ -#define RS6000_BTC_OPND_MASK 0x0007 /* Mask to isolate operands. */ - -/* Builtin attributes. */ -#define RS6000_BTC_SPECIAL 0x /* Special function. */ -#define RS6000_BTC_PREDICATE 0x0008 /* predicate function. */ -#define RS6000_BTC_ABS 0x0010 /* Altivec/VSX ABS - function. */ -#define RS6000_BTC_DST 0x0020 /* Altivec DST function. */ - -#define RS6000_BTC_TYPE_MASK 0x003f /* Mask to isolate types */ - -#define RS6000_BTC_MISC0x /* No special attributes. */ -#define RS6000_BTC_CONST 0x0100 /* Neither uses, nor - modifies global state. */ -#define RS6000_BTC_PURE0x0200 /* reads global - state/mem and does - not modify global state. */ -#define RS6000_BTC_FP 0x0400 /* depends on rounding mode. */ -#define RS6000_BTC_QUAD0x0800 /* Uses a register quad. */ -#define RS6000_BTC_PAIR0x1000 /* Uses a register pair. */ -#define RS6000_BTC_QUADPAIR0x1800 /* Uses a quad and a pair. */ -#define RS6000_BTC_ATTR_MASK 0x1f00 /* Mask of the attributes. */ - -/* Miscellaneous information. */ -#define RS6000_BTC_SPR 0x0100 /* function references SPRs. */ -#define RS6000_BTC_VOID0x0200 /* function has no return value. */ -#define RS6000_BTC_CR 0x0400 /* function references a CR. */
Re: [PATCH] Optimize multiply/add of DImode extended to TImode, PR target/103109.
On Tue, 2022-05-17 at 23:15 -0400, Michael Meissner wrote: > On Fri, May 13, 2022 at 01:20:30PM -0500, will schmidt wrote: > > On Fri, 2022-05-13 at 12:17 -0400, Michael Meissner wrote: > > > > > > > > > gcc/ > > > PR target/103109 > > > * config/rs6000/rs6000.md (su_int32): New code attribute. > > > (mul3): Convert from define_expand to > > > define_insn_and_split. > > > (maddld4): Add generator function. > > > > -(define_insn "*maddld4" > > +(define_insn "maddld4" > > > > Is the removal of the "*" considering adding generator? (Thats > > terminology that I'm not immediately familiar with). > > Yes. If you have a pattern: > > (define_insn "foosi2" > [(set (match_operand:SI 0 "register_operand" "=r") > (foo:SI (match_operand:SI 1 "register_operand" "r")))] > "" > "foo %0,%1") > > It creates a 'gen_foosi2' function that has 2 arguments, and it makes > the insn > listed. > > It then has support for insn recognition and output. > > If the pattern starts with a '*', there is no 'gen_foosi2' function > created, > but the insn recognitiion and output are still done. > > In practice, you typically use the '*' names for patterns that are > used as the > targets of combination, or separate insns for different machines. > > Here is the verbage from rtl.texi: > > These names serve one of two purposes. The first is to indicate that > the > instruction performs a certain standard job for the RTL-generation > pass of the compiler, such as a move, an addition, or a conditional > jump. The second is to help the target generate certain target- > specific > operations, such as when implementing target-specific intrinsic > functions. > > It is better to prefix target-specific names with the name of the > target, to avoid any clash with current or future standard names. > > The absence of a name is indicated by writing an empty string > where the name should go. Nameless instruction patterns are never > used for generating RTL code, but they may permit several simpler > insns > to be combined later on. > > For the purpose of debugging the compiler, you may also specify a > name beginning with the @samp{*} character. Such a name is used only > for identifying the instruction in RTL dumps; it is equivalent to > having > a nameless pattern for all other purposes. Names beginning with the > @samp{*} character are not required to be unique. Thanks for the explanation. :-) -Will
Re: [PATCH, rs6000] Optimize pcrel access of globals (updated, ping)
On Wed, 2020-11-04 at 12:10 -0600, acsawdey--- via Gcc-patches wrote: > From: Aaron Sawdey > > Ping, as it has been a while. > This also includes a slight fix to make sure that all references can get > optimized. > I've read over what I could. a few nits below, nothing significant jumped out at me, also not my area of expertise. :-) comments inline below. thanks -WIll > This patch implements a RTL pass that looks for pc-relative loads of the > address of an external variable using the PCREL_GOT relocation and a > single load or store that uses that external address. > > Produced by a cast of thousands: > * Michael Meissner > * Peter Bergner > * Bill Schmidt > * Alan Modra > * Segher Boessenkool > * Aaron Sawdey > > Passes bootstrap/regtest on ppc64le power10. OK for trunk? Any impact to non-power10 targets? (power9,power8, or BE, ...) > > gcc/ChangeLog: > > * config.gcc: Add pcrel-opt.o. pcrel-opt.c and pcrel-opt.o entries. > * config/rs6000/pcrel-opt.c: New file. > * config/rs6000/pcrel-opt.md: New file. > * config/rs6000/predicates.md: Add d_form_memory predicate. > * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_PCREL_OPT. > * config/rs6000/rs6000-passes.def: Add pass_pcrel_opt. > * config/rs6000/rs6000-protos.h: Add reg_to_non_prefixed(), > offsettable_non_prefixed_memory(), output_pcrel_opt_reloc(), > and make_pass_pcrel_opt(). > * config/rs6000/rs6000.c (reg_to_non_prefixed): Make global. > (rs6000_option_override_internal): Add pcrel-opt. > (rs6000_delegitimize_address): Support pcrel-opt. > (rs6000_opt_masks): Add pcrel-opt. > (offsettable_non_prefixed_memory): New function. > (reg_to_non_prefixed): Make global. > (rs6000_asm_output_opcode): Reset next_insn_prefixed_p. > (output_pcrel_opt_reloc): New function. > * config/rs6000/rs6000.md (loads_extern_addr): New attr. > (pcrel_extern_addr): Set loads_extern_addr. > Add include for pcrel-opt.md. > * config/rs6000/rs6000.opt: Add -mpcrel-opt. > * config/rs6000/t-rs6000: Add rules for pcrel-opt.c and > pcrel-opt.md. indent. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pcrel-opt-inc-di.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-df.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-di.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-hi.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-qi.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-sf.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-si.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-vector.c: New test. > * gcc.target/powerpc/pcrel-opt-st-df.c: New test. > * gcc.target/powerpc/pcrel-opt-st-di.c: New test. > * gcc.target/powerpc/pcrel-opt-st-hi.c: New test. > * gcc.target/powerpc/pcrel-opt-st-qi.c: New test. > * gcc.target/powerpc/pcrel-opt-st-sf.c: New test. > * gcc.target/powerpc/pcrel-opt-st-si.c: New test. > * gcc.target/powerpc/pcrel-opt-st-vector.c: New test. > --- > gcc/config.gcc| 6 +- > gcc/config/rs6000/pcrel-opt.c | 888 ++ > gcc/config/rs6000/pcrel-opt.md| 386 > gcc/config/rs6000/predicates.md | 23 + > gcc/config/rs6000/rs6000-cpus.def | 2 + > gcc/config/rs6000/rs6000-passes.def | 8 + > gcc/config/rs6000/rs6000-protos.h | 4 + > gcc/config/rs6000/rs6000.c| 116 ++- > gcc/config/rs6000/rs6000.md | 8 +- > gcc/config/rs6000/rs6000.opt | 4 + > gcc/config/rs6000/t-rs6000| 7 +- > .../gcc.target/powerpc/pcrel-opt-inc-di.c | 18 + > .../gcc.target/powerpc/pcrel-opt-ld-df.c | 36 + > .../gcc.target/powerpc/pcrel-opt-ld-di.c | 43 + > .../gcc.target/powerpc/pcrel-opt-ld-hi.c | 42 + > .../gcc.target/powerpc/pcrel-opt-ld-qi.c | 42 + > .../gcc.target/powerpc/pcrel-opt-ld-sf.c | 42 + > .../gcc.target/powerpc/pcrel-opt-ld-si.c | 41 + > .../gcc.target/powerpc/pcrel-opt-ld-vector.c | 36 + > .../gcc.target/powerpc/pcrel-opt-st-df.c | 36 + > .../gcc.target/powerpc/pcrel-opt-st-di.c | 37 + > .../gcc.target/powerpc/pcrel-opt-st-hi.c | 42 + > .../gcc.target/powerpc/pcrel-opt-st-qi.c | 42 + > .../gcc.target/powerpc/pcrel-opt-st-sf.c | 36 + > .../gcc.target/powerpc/pcrel-opt-st-si.c | 41 + > .../gcc.target/powerpc/pcrel-opt-st-vector.c | 36 + > 26 files changed, 2013 insertions(+), 9 deletions(-) > create mode 100644 gcc/config/rs6000/pcrel-opt.c > create mode 100644 gcc/config/rs6000/pcrel-opt.md > create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c > create mod
Re: [PATCH, rs6000] Update instruction attributes for Power10
On Wed, 2020-11-04 at 14:42 -0600, Pat Haugen via Gcc-patches wrote: > Update instruction attributes for Power10. > > > This patch updates the type/prefixed/dot/size attributes for various new > instructions (and a couple existing that were incorrect) in preparation for > the Power10 scheduling patch that will be following. > > Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok > for trunk? > > -Pat > > > 2020-11-04 Pat Haugen > > gcc/ > * config/rs6000/altivec.md (vsdb_, xxspltiw_v4si, > xxspltiw_v4sf_inst, xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, > xxsplti32dx_v4sf_inst, xxblend_, xxpermx_inst, > vstrir_code_, vstrir_p_code_, vstril_code_, > vstril_p_code_, altivec_lvsl_reg, altivec_lvsl_direct, > altivec_lvsr_reg, altivec_lvsr_direct, xxeval, vcfuged, vclzdm, > vctzdm, vpdepd, vpextd, vgnb, vclrlb, vclrrb): Update instruction > attributes for Power10. > * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1, > floatditd2, ftrunc2, fixdi2, dfp_ddedpd_, > dfp_denbcd_, dfp_dxex_, dfp_diex_, > *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Likewise. > * config/rs6000/mma.md (*movpoi, mma_, mma_, > mma_, mma_, mma_, mma_, > mma_, mma_, mma_, mma_): > Likewise. > * config/rs6000/rs6000.c (rs6000_final_prescan_insn): Only add 'p' for > PREFIXED_YES. The code change reads as roughly - next_insns_prefixed_p != PREFIXED_NO + next_insn_prefixed_p == PREFIXED_YES" So just an inversion of the logic? I don't obviously see the 'p' impact there. > * config/rs6000/rs6000.md (define_attr "size"): Add 256. > (define_attr "prefixed"): Add 'always'. > (define_mode_attr bits): Add DD/TD modes. > (cfuged, cntlzdm, cnttzdm, pdepd, pextd, bswaphi2_reg, bswapsi2_reg, > bswapdi2_brd, setbc_signed_, > *setbcr_signed_, *setnbc_signed_, > *setnbcr_signed_): Update instruction attributes for > Power10. ok. (assuming the assorted 'integer' -> 'crypto' changes are correct, of course). > * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti, > store_conditionalpti): Update instruction attributes for Power10. > * config/rs6000/vsx.md (*xvtlsbb_internal, xxgenpcvm__internal, > vextractl_internal, vextractr_internal, > vinsertvl_internal_, vinsertvr_internal_, > vinsertgl_internal_, vinsertgr_internal_, > vreplace_elt__inst): Likewise. lgtm, thanks -Will >
Re: [PATCH,rs6000] Add patterns for combine to support p10 fusion
On Wed, 2020-11-04 at 12:12 -0600, Aaron Sawdey via Gcc-patches wrote: > Ping. > > Aaron Sawdey, Ph.D. saw...@linux.ibm.com > IBM Linux on POWER Toolchain > > > > On Oct 26, 2020, at 4:44 PM, acsaw...@linux.ibm.com wrote: > > > > From: Aaron Sawdey > > Hi, > > This patch adds the first couple patterns to support p10 fusion. These > > will allow combine to create a single insn for a pair of instructions > > that that power10 can fuse and execute. These particular ones have the that the power10 s/particular ones/particular insns/ > > requirement that only cr0 can be used when fusing a load with a compare > > immediate of -1/0/1, so we want combine to put that requirement in, and > > if it doesn't work out later the splitter can get used. > > > > This also adds option -mpower10-fusion which defaults on for power10 and > > will gate all these fusion patterns. In addition I have added an > > undocumented option -mpower10-fusion-ld-cmpi (which may be removed later) > > that just controls the load+compare-immediate patterns. ok > > I have make made > > these default on for power10 but they are not disallowed for earlier to on > > processors because it is still valid code. This allows us to test the > > correctness of fusion code generation by turning it on explicitly. > > > > The intention is to work through more patterns of this style to support > > the rest of the power10 fusion pairs. > > > > Bootstrap and regtest looks good on ppc64le power9 with these patterns > > enabled in stage2/stage3 and for regtest. Ok for trunk? > > > > gcc/ChangeLog: > > > > * config/rs6000/predicates.md: Add const_me_to_1_operand. > > * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and > > OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER. to ... and OTHER_P9_VECTOR_MASKS > > * config/rs6000/rs6000-protos.h (address_ok_for_form): Add > > prototype. > > * config/rs6000/rs6000.c (rs6000_option_override_internal): > > automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi > > if target is power10. (rs600_opt_masks): Allow -mpower10-fusion > > in function attributes. (address_ok_for_form): New function. ok > > * config/rs6000/rs6000.h: Add MASK_P10_FUSION. > > * config/rs6000/rs6000.md (*ld_cmpi_cr0): New > > define_insn_and_split. > > (*lwa_cmpdi_cr0): New define_insn_and_split. > > (*lwa_cmpwi_cr0): New define_insn_and_split. > > * config/rs6000/rs6000.opt: Add -mpower10-fusion > > and -mpower10-fusion-ld-cmpi. > > --- > > gcc/config/rs6000/predicates.md | 5 +++ > > gcc/config/rs6000/rs6000-cpus.def | 6 ++- > > gcc/config/rs6000/rs6000-protos.h | 2 + > > gcc/config/rs6000/rs6000.c| 34 > > gcc/config/rs6000/rs6000.h| 1 + > > gcc/config/rs6000/rs6000.md | 68 +++ > > gcc/config/rs6000/rs6000.opt | 8 > > 7 files changed, 123 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/config/rs6000/predicates.md > > b/gcc/config/rs6000/predicates.md > > index 4c2fe7fa312..b75c1ddfb69 100644 > > --- a/gcc/config/rs6000/predicates.md > > +++ b/gcc/config/rs6000/predicates.md > > @@ -297,6 +297,11 @@ (define_predicate "const_0_to_1_operand" > > (and (match_code "const_int") > >(match_test "IN_RANGE (INTVAL (op), 0, 1)"))) > > > > +;; Match op = -1, op = 0, or op = 1. > > +(define_predicate "const_m1_to_1_operand" > > + (and (match_code "const_int") > > + (match_test "IN_RANGE (INTVAL (op), -1, 1)"))) > > + > > ;; Match op = 0..3. > > (define_predicate "const_0_to_3_operand" > > (and (match_code "const_int") ok > > diff --git a/gcc/config/rs6000/rs6000-cpus.def > > b/gcc/config/rs6000/rs6000-cpus.def > > index 8d2c1ffd6cf..3e65289d8df 100644 > > --- a/gcc/config/rs6000/rs6000-cpus.def > > +++ b/gcc/config/rs6000/rs6000-cpus.def > > @@ -82,7 +82,9 @@ > > > > #define ISA_3_1_MASKS_SERVER(ISA_3_0_MASKS_SERVER > > \ > > | OPTION_MASK_POWER10 \ > > -| OTHER_POWER10_MASKS) > > +| OTHER_POWER10_MASKS \ > > +| OPTION_MASK_P10_FUSION \ > > +| OPTION_MASK_P10_FUSION_LD_CMPI) > > > > /* Flags that need to be turned off if -mno-power9-vector. */ > > #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW > > \ > > @@ -129,6 +131,8 @@ > > | OPTION_MASK_FLOAT128_KEYWORD \ > > | OPTION_MASK_FPRND\ > > | OPTION_MASK_POWER10 \ > > +| OPTION_MASK_P10_FUSION \ > > +| OPTION_MASK_P10_FUSION_LD_CMPI \ > > | OP
Re: [PATCH, rs6000] Update instruction attributes for Power10
On Fri, 2020-11-06 at 10:46 -0600, Pat Haugen wrote: > On 11/5/20 4:32 PM, will schmidt wrote: > > On Wed, 2020-11-04 at 14:42 -0600, Pat Haugen via Gcc-patches > > wrote: > > > * config/rs6000/rs6000.c (rs6000_final_prescan_insn): Only add > > > 'p' for > > > PREFIXED_YES. > > > > The code change reads as roughly > > - next_insn_prefixed_p != PREFIXED_NO > > > > + next_insn_prefixed_p == PREFIXED_YES" > > > > So just an inversion of the logic? I don't obviously see the 'p' > > impact > > there. > > > > It's no longer an inversion of the logic since I added a > PREFIXED_ALWAYS value. 'next_insn_prefixed' is used by > rs6000_final_prescan_insn() to determine whether an insn mnemonic > needs a 'p' prefix. We want it set for PREFIXED_YES, but not for > PREFIXED_NO or PREFIXED_ALWAYS. Ok. So the next_insn_prefixed_p indicates whether the instruction has/gets/needs a p prefix. gotcha. thanks for clarifying. :-) thanks -will > > > > > > * config/rs6000/rs6000.md (define_attr "size"): Add 256. > > > (define_attr "prefixed"): Add 'always'. > > > (define_mode_attr bits): Add DD/TD modes. > > > (cfuged, cntlzdm, cnttzdm, pdepd, pextd, bswaphi2_reg, > > > bswapsi2_reg, > > > bswapdi2_brd, setbc_signed_, > > > *setbcr_signed_, *setnbc_signed_, > > > *setnbcr_signed_): Update instruction attributes > > > for > > > Power10. > > > > ok. (assuming the assorted 'integer' -> 'crypto' changes are > > correct, > > of course). > > > > Yes, crypto represents the correct pipe the insns are executed on. > > Thanks for the review, > Pat >
Re: [PATCH 01/18] rs6000: Handle overloads during program parsing
On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote: Hi, Just a couple cosmetic nits noted below, the majority if which is also in the original code this is based on. THanks -Will > Although this patch looks quite large, the changes are fairly minimal. > Most of it is duplicating the large function that does the overload > resolution using the automatically generated data structures instead of > the old hand-generated ones. This doesn't make the patch terribly easy to > review, unfortunately. Just be aware that generally we aren't changing > the logic and functionality of overload handling. ok > > 2021-08-31 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-c.c (rs6000-builtins.h): New include. > (altivec_resolve_new_overloaded_builtin): New forward decl. > (rs6000_new_builtin_type_compatible): New function. > (altivec_resolve_overloaded_builtin): Call > altivec_resolve_new_overloaded_builtin. > (altivec_build_new_resolved_builtin): New function. > (altivec_resolve_new_overloaded_builtin): Likewise. > * config/rs6000/rs6000-call.c (rs6000_new_builtin_is_supported): > Likewise. > * config/rs6000/rs6000-gen-builtins.c (write_decls): Remove _p from > name of rs6000_new_builtin_is_supported. ok > --- > gcc/config/rs6000/rs6000-c.c| 1088 +++ > gcc/config/rs6000/rs6000-call.c | 53 ++ > gcc/config/rs6000/rs6000-gen-builtins.c |2 +- > 3 files changed, 1142 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c > index afcb5bb6e39..aafb4e6a98f 100644 > --- a/gcc/config/rs6000/rs6000-c.c > +++ b/gcc/config/rs6000/rs6000-c.c > @@ -35,6 +35,9 @@ > #include "langhooks.h" > #include "c/c-tree.h" > > +#include "rs6000-builtins.h" > + > +static tree altivec_resolve_new_overloaded_builtin (location_t, tree, void > *); > > > /* Handle the machine specific pragma longcall. Its syntax is > @@ -811,6 +814,30 @@ is_float128_p (tree t) > && t == long_double_type_node)); > } > > +static bool > +rs6000_new_builtin_type_compatible (tree t, tree u) > +{ > + if (t == error_mark_node) > +return false; > + > + if (INTEGRAL_TYPE_P (t) && INTEGRAL_TYPE_P (u)) > +return true; > + > + if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128 > + && is_float128_p (t) && is_float128_p (u)) > +return true; > + > + if (POINTER_TYPE_P (t) && POINTER_TYPE_P (u)) > +{ > + t = TREE_TYPE (t); > + u = TREE_TYPE (u); > + if (TYPE_READONLY (u)) > + t = build_qualified_type (t, TYPE_QUAL_CONST); > +} > + > + return lang_hooks.types_compatible_p (t, u); > +} > + ok > static inline bool > rs6000_builtin_type_compatible (tree t, int id) > { > @@ -927,6 +954,10 @@ tree > altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, > void *passed_arglist) > { > + if (new_builtins_are_live) > +return altivec_resolve_new_overloaded_builtin (loc, fndecl, > +passed_arglist); > + >vec *arglist = static_cast *> > (passed_arglist); >unsigned int nargs = vec_safe_length (arglist); >enum rs6000_builtins fcode ok > @@ -1930,3 +1961,1060 @@ altivec_resolve_overloaded_builtin (location_t loc, > tree fndecl, > return error_mark_node; >} > } > + > +/* Build a tree for a function call to an Altivec non-overloaded builtin. > + The overloaded builtin that matched the types and args is described > + by DESC. The N arguments are given in ARGS, respectively. > + > + Actually the only thing it does is calling fold_convert on ARGS, with > + a small exception for vec_{all,any}_{ge,le} predicates. */ > + > +static tree > +altivec_build_new_resolved_builtin (tree *args, int n, tree fntype, > + tree ret_type, > + rs6000_gen_builtins bif_id, > + rs6000_gen_builtins ovld_id) > +{ > + tree argtypes = TYPE_ARG_TYPES (fntype); > + tree arg_type[MAX_OVLD_ARGS]; > + tree fndecl = rs6000_builtin_decls_x[bif_id]; > + tree call; > + > + for (int i = 0; i < n; i++) > +arg_type[i] = TREE_VALUE (argtypes), argtypes = TREE_CHAIN (argtypes); > + > + /* The AltiVec overloading implementation is overall gross, but this > + is particularly disgusting. The vec_{all,any}_{ge,le} builtins > + are completely different for floating-point vs. integer vector > + types, because the former has vcmpgefp, but the latter should use > + vcmpgtXX. > + > + In practice, the second and third arguments are swapped, and the > + condition (LT vs. EQ, which is recognizable by bit 1 of the first > + argument) is reversed. Patch the arguments here before building > + the resolved CALL_EXPR. */ > + if (n == 3 > + && ovld_id == RS6000_OVLD_VEC_CMPGE_P > + && bif_id != RS6000_BIF_VCMPGEFP
Re: [PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza
On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote: > I over-restricted use of __builtin_mffsl, since I was unaware that it > automatically uses mffs when mffsl is not available. Paul Clarke > pointed > this out in discussion of his SSE 4.1 compatibility patches. > > 2021-08-31 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-call.c (__builtin_mffsl): Move from > [power9] > to [always]. > --- > gcc/config/rs6000/rs6000-builtin-new.def | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtin-new.def > b/gcc/config/rs6000/rs6000-builtin-new.def > index 6a28d5189f8..a8c6b9e988f 100644 > --- a/gcc/config/rs6000/rs6000-builtin-new.def > +++ b/gcc/config/rs6000/rs6000-builtin-new.def > @@ -208,6 +208,12 @@ >double __builtin_mffs (); > MFFS rs6000_mffs {} > > +; Although the mffsl instruction is only available on POWER9 and > later > +; processors, this builtin automatically falls back to mffs on older > +; platforms. Thus it appears here in the [always] stanza. > + double __builtin_mffsl (); > +MFFSL rs6000_mffsl {} > + > ; This thing really assumes long double == __ibm128, and I'm told it > has > ; been used as such within libgcc. Given that __builtin_pack_ibm128 > ; exists for the same purpose, this should really not be used at > all. > @@ -2784,9 +2790,6 @@ >signed long long __builtin_darn_raw (); > DARN_RAW darn_raw {} > > - double __builtin_mffsl (); > -MFFSL rs6000_mffsl {} > - >const signed int __builtin_dtstsfi_eq_dd (const int<6>, > _Decimal64); > TSTSFI_EQ_DD dfptstsfi_eq_dd {} > Looks reasonable, Thanks -Will
Re: [PATCH 03/18] rs6000: Handle gimple folding of target built-ins
On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote: > This is another patch that looks bigger than it really is. Because we > have a new namespace for the builtins, allowing us to have both the old > and new builtin infrastructure supported at once, we need versions of > these functions that use the new builtin namespace. Otherwise the code is > unchanged. > > 2021-08-31 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin): > New forward decl. > (rs6000_gimple_fold_builtin): Call rs6000_gimple_fold_new_builtin. > (rs6000_new_builtin_valid_without_lhs): New function. > (rs6000_gimple_fold_new_mma_builtin): Likewise. > (rs6000_gimple_fold_new_builtin): Likewise. > --- > gcc/config/rs6000/rs6000-call.c | 1165 +++ > 1 file changed, 1165 insertions(+) > > diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c > index 2c68aa3580c..eae4e15df1e 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, > machine_mode, > static void rs6000_common_init_builtins (void); > static void htm_init_builtins (void); > static void mma_init_builtins (void); > +static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi); > > > /* Hash table to keep track of the argument types for builtin functions. */ > @@ -12024,6 +12025,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator > *gsi) > bool > rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) > { > + if (new_builtins_are_live) > +return rs6000_gimple_fold_new_builtin (gsi); > + >gimple *stmt = gsi_stmt (*gsi); >tree fndecl = gimple_call_fndecl (stmt); >gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == > BUILT_IN_MD); ok > @@ -12971,6 +12975,35 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator > *gsi) >return false; > } > > +/* Helper function to sort out which built-ins may be valid without having > +a LHS. */ > +static bool > +rs6000_new_builtin_valid_without_lhs (enum rs6000_gen_builtins fn_code, > + tree fndecl) > +{ > + if (TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node) > +return true; Is that a better or improved version of the code as seen in rs6000_builtin_valid_without_lhs ? That is > if (rs6000_builtin_info[fn_code].attr & RS6000_BTC_VOID) >return true; ok either way. > + > + switch (fn_code) > +{ > +case RS6000_BIF_STVX_V16QI: > +case RS6000_BIF_STVX_V8HI: > +case RS6000_BIF_STVX_V4SI: > +case RS6000_BIF_STVX_V4SF: > +case RS6000_BIF_STVX_V2DI: > +case RS6000_BIF_STVX_V2DF: > +case RS6000_BIF_STXVW4X_V16QI: > +case RS6000_BIF_STXVW4X_V8HI: > +case RS6000_BIF_STXVW4X_V4SF: > +case RS6000_BIF_STXVW4X_V4SI: > +case RS6000_BIF_STXVD2X_V2DF: > +case RS6000_BIF_STXVD2X_V2DI: > + return true; > +default: > + return false; > +} > +} > + > /* Check whether a builtin function is supported in this target > configuration. */ > bool > @@ -13024,6 +13057,1138 @@ rs6000_new_builtin_is_supported (enum > rs6000_gen_builtins fncode) >gcc_unreachable (); > } > > +/* Expand the MMA built-ins early, so that we can convert the > pass-by-reference > + __vector_quad arguments into pass-by-value arguments, leading to more > + efficient code generation. */ > +static bool > +rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi, > + rs6000_gen_builtins fn_code) > +{ > + gimple *stmt = gsi_stmt (*gsi); > + size_t fncode = (size_t) fn_code; > + > + if (!bif_is_mma (rs6000_builtin_info_x[fncode])) > +return false; > + > + /* Each call that can be gimple-expanded has an associated built-in > + function that it will expand into. If this one doesn't, we have > + already expanded it! */ > + if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE) > +return false; > + > + bifdata *bd = &rs6000_builtin_info_x[fncode]; > + unsigned nopnds = bd->nargs; > + gimple_seq new_seq = NULL; > + gimple *new_call; > + tree new_decl; > + > + /* Compatibility built-ins; we used to call these > + __builtin_mma_{dis,}assemble_pair, but now we call them > + __builtin_vsx_{dis,}assemble_pair. Handle the old versions. */ > + if (fncode == RS6000_BIF_ASSEMBLE_PAIR) > +fncode = RS6000_BIF_ASSEMBLE_PAIR_V; > + else if (fncode == RS6000_BIF_DISASSEMBLE_PAIR) > +fncode = RS6000_BIF_DISASSEMBLE_PAIR_V; > + > + if (fncode == RS6000_BIF_DISASSEMBLE_ACC > + || fncode == RS6000_BIF_DISASSEMBLE_PAIR_V) > +{ > + /* This is an MMA disassemble built-in function. */ > + push_gimplify_context (true); > + unsigned nvec = (fncode == RS6000_BIF_DISASSEMBLE_ACC) ? 4 : 2; > + tree dst_ptr = gimple_call_arg (stmt, 0); > + tree src_ptr = gimple_call_ar
Re: [PATCH 04/18] rs6000: Handle some recent MMA builtin changes
On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote: > Peter Bergner recently added two new builtins __builtin_vsx_lxvp and > __builtin_vsx_stxvp. These happened to break a pattern in MMA builtins that > I had been using to automate gimple folding of MMA builtins. Previously, > every MMA function that could be folded had an associated internal function > that it was folded into. The LXVP/STXVP builtins are just folded directly > into memory operations. > > Instead of relying on this pattern, this patch adds a new attribute to > builtins called "mmaint," which is set for all MMA builtins that have an > associated internal builtin. The naming convention that adds _INTERNAL to > the builtin index name remains. > > The rest of the patch is just duplicating Peter's patch, using the new > builtin infrastructure. > > 2021-08-23 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-builtin-new.def (ASSEMBLE_ACC): Add mmaint flag. > (ASSEMBLE_PAIR): Likewise. > (BUILD_ACC): Likewise. > (DISASSEMBLE_ACC): Likewise. > (DISASSEMBLE_PAIR): Likewise. > (PMXVBF16GER2): Likewise. > (PMXVBF16GER2NN): Likewise. > (PMXVBF16GER2NP): Likewise. > (PMXVBF16GER2PN): Likewise. > (PMXVBF16GER2PP): Likewise. > (PMXVF16GER2): Likewise. > (PMXVF16GER2NN): Likewise. > (PMXVF16GER2NP): Likewise. > (PMXVF16GER2PN): Likewise. > (PMXVF16GER2PP): Likewise. > (PMXVF32GER): Likewise. > (PMXVF32GERNN): Likewise. > (PMXVF32GERNP): Likewise. > (PMXVF32GERPN): Likewise. > (PMXVF32GERPP): Likewise. > (PMXVF64GER): Likewise. > (PMXVF64GERNN): Likewise. > (PMXVF64GERNP): Likewise. > (PMXVF64GERPN): Likewise. > (PMXVF64GERPP): Likewise. > (PMXVI16GER2): Likewise. > (PMXVI16GER2PP): Likewise. > (PMXVI16GER2S): Likewise. > (PMXVI16GER2SPP): Likewise. > (PMXVI4GER8): Likewise. > (PMXVI4GER8PP): Likewise. > (PMXVI8GER4): Likewise. > (PMXVI8GER4PP): Likewise. > (PMXVI8GER4SPP): Likewise. > (XVBF16GER2): Likewise. > (XVBF16GER2NN): Likewise. > (XVBF16GER2NP): Likewise. > (XVBF16GER2PN): Likewise. > (XVBF16GER2PP): Likewise. > (XVF16GER2): Likewise. > (XVF16GER2NN): Likewise. > (XVF16GER2NP): Likewise. > (XVF16GER2PN): Likewise. > (XVF16GER2PP): Likewise. > (XVF32GER): Likewise. > (XVF32GERNN): Likewise. > (XVF32GERNP): Likewise. > (XVF32GERPN): Likewise. > (XVF32GERPP): Likewise. > (XVF64GER): Likewise. > (XVF64GERNN): Likewise. > (XVF64GERNP): Likewise. > (XVF64GERPN): Likewise. > (XVF64GERPP): Likewise. > (XVI16GER2): Likewise. > (XVI16GER2PP): Likewise. > (XVI16GER2S): Likewise. > (XVI16GER2SPP): Likewise. > (XVI4GER8): Likewise. > (XVI4GER8PP): Likewise. > (XVI8GER4): Likewise. > (XVI8GER4PP): Likewise. > (XVI8GER4SPP): Likewise. > (XXMFACC): Likewise. > (XXMTACC): Likewise. > (XXSETACCZ): Likewise. > (ASSEMBLE_PAIR_V): Likewise. > (BUILD_PAIR): Likewise. > (DISASSEMBLE_PAIR_V): Likewise. > (LXVP): New. > (STXVP): New. ok > * config/rs6000/rs6000-call.c > (rs6000_gimple_fold_new_mma_builtin): Handle RS6000_BIF_LXVP and > RS6000_BIF_STXVP. > * config/rs6000/rs6000-gen-builtins.c (attrinfo): Add ismmaint. > (parse_bif_attrs): Handle ismmaint. > (write_decls): Add bif_mmaint_bit and bif_is_mmaint. > (write_bif_static_init): Handle ismmaint. ok > --- > gcc/config/rs6000/rs6000-builtin-new.def | 145 --- > gcc/config/rs6000/rs6000-call.c | 38 +- > gcc/config/rs6000/rs6000-gen-builtins.c | 38 +++--- > 3 files changed, 135 insertions(+), 86 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtin-new.def > b/gcc/config/rs6000/rs6000-builtin-new.def > index a8c6b9e988f..1966516551e 100644 > --- a/gcc/config/rs6000/rs6000-builtin-new.def > +++ b/gcc/config/rs6000/rs6000-builtin-new.def > @@ -129,6 +129,7 @@ > ; mma Needs special handling for MMA > ; quad MMA instruction using a register quad as an input operand > ; pair MMA instruction using a register pair as an input operand > +; mmaint MMA instruction expanding to internal call at GIMPLE time > ; no32bit Not valid for TARGET_32BIT > ; 32bitRequires different handling for TARGET_32BIT > ; cpu This is a "cpu_is" or "cpu_supports" builtin > @@ -3584,415 +3585,421 @@ > > [mma] >void __builtin_mma_assemble_acc (v512 *, vuc, vuc, vuc, vuc); > -ASSEMBLE_ACC nothing {mma} > +ASSEMBLE_ACC nothing {mma,mmaint} > >v512 __builtin_mma_assemble_acc_internal (vuc, vuc, vuc, vuc); > ASSEMBLE_ACC_INTERNAL mma_assemble_acc {mma} > >void __builtin_mma_assemble_pair (v256 *, vuc, vuc); > -ASSEMBLE_PAIR nothing {mma} > +
Re: [PATCH 05/18] rs6000: Support for vectorizing built-in functions
On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote: > This patch just duplicates a couple of functions and adjusts them to use the > new builtin names. There's no logical change otherwise. > > 2021-08-31 Bill Schmidt > > gcc/ > * config/rs6000/rs6000.c (rs6000-builtins.h): New include. > (rs6000_new_builtin_vectorized_function): New function. > (rs6000_new_builtin_md_vectorized_function): Likewise. > (rs6000_builtin_vectorized_function): Call > rs6000_new_builtin_vectorized_function. > (rs6000_builtin_md_vectorized_function): Call > rs6000_new_builtin_md_vectorized_function. ok > --- > gcc/config/rs6000/rs6000.c | 253 + > 1 file changed, 253 insertions(+) > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index b7ea1483da5..52c78c7500c 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -78,6 +78,7 @@ > #include "case-cfn-macros.h" > #include "ppc-auxv.h" > #include "rs6000-internal.h" > +#include "rs6000-builtins.h" > #include "opts.h" > > /* This file should be included last. */ > @@ -5501,6 +5502,251 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct > loop *loop) >return nunroll; > } > > +/* Returns a function decl for a vectorized version of the builtin function > + with builtin function code FN and the result vector type TYPE, or > NULL_TREE > + if it is not available. */ > + > +static tree > +rs6000_new_builtin_vectorized_function (unsigned int fn, tree type_out, > + tree type_in) > +{ > + machine_mode in_mode, out_mode; > + int in_n, out_n; > + > + if (TARGET_DEBUG_BUILTIN) > +fprintf (stderr, "rs6000_new_builtin_vectorized_function (%s, %s, %s)\n", > + combined_fn_name (combined_fn (fn)), > + GET_MODE_NAME (TYPE_MODE (type_out)), > + GET_MODE_NAME (TYPE_MODE (type_in))); > + > + if (TREE_CODE (type_out) != VECTOR_TYPE > + || TREE_CODE (type_in) != VECTOR_TYPE) > +return NULL_TREE; > + > + out_mode = TYPE_MODE (TREE_TYPE (type_out)); > + out_n = TYPE_VECTOR_SUBPARTS (type_out); > + in_mode = TYPE_MODE (TREE_TYPE (type_in)); > + in_n = TYPE_VECTOR_SUBPARTS (type_in); > + > + switch (fn) > +{ > +CASE_CFN_COPYSIGN: > + if (VECTOR_UNIT_VSX_P (V2DFmode) > + && out_mode == DFmode && out_n == 2 > + && in_mode == DFmode && in_n == 2) > + return rs6000_builtin_decls_x[RS6000_BIF_CPSGNDP]; > + if (VECTOR_UNIT_VSX_P (V4SFmode) > + && out_mode == SFmode && out_n == 4 > + && in_mode == SFmode && in_n == 4) > + return rs6000_builtin_decls_x[RS6000_BIF_CPSGNSP]; > + if (VECTOR_UNIT_ALTIVEC_P (V4SFmode) > + && out_mode == SFmode && out_n == 4 > + && in_mode == SFmode && in_n == 4) > + return rs6000_builtin_decls_x[RS6000_BIF_COPYSIGN_V4SF]; > + break; > +CASE_CFN_CEIL: > + if (VECTOR_UNIT_VSX_P (V2DFmode) > + && out_mode == DFmode && out_n == 2 > + && in_mode == DFmode && in_n == 2) > + return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIP]; > + if (VECTOR_UNIT_VSX_P (V4SFmode) > + && out_mode == SFmode && out_n == 4 > + && in_mode == SFmode && in_n == 4) > + return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIP]; > + if (VECTOR_UNIT_ALTIVEC_P (V4SFmode) > + && out_mode == SFmode && out_n == 4 > + && in_mode == SFmode && in_n == 4) > + return rs6000_builtin_decls_x[RS6000_BIF_VRFIP]; > + break; > +CASE_CFN_FLOOR: > + if (VECTOR_UNIT_VSX_P (V2DFmode) > + && out_mode == DFmode && out_n == 2 > + && in_mode == DFmode && in_n == 2) > + return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIM]; > + if (VECTOR_UNIT_VSX_P (V4SFmode) > + && out_mode == SFmode && out_n == 4 > + && in_mode == SFmode && in_n == 4) > + return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIM]; > + if (VECTOR_UNIT_ALTIVEC_P (V4SFmode) > + && out_mode == SFmode && out_n == 4 > + && in_mode == SFmode && in_n == 4) > + return rs6000_builtin_decls_x[RS6000_BIF_VRFIM]; > + break; > +CASE_CFN_FMA: > + if (VECTOR_UNIT_VSX_P (V2DFmode) > + && out_mode == DFmode && out_n == 2 > + && in_mode == DFmode && in_n == 2) > + return rs6000_builtin_decls_x[RS6000_BIF_XVMADDDP]; > + if (VECTOR_UNIT_VSX_P (V4SFmode) > + && out_mode == SFmode && out_n == 4 > + && in_mode == SFmode && in_n == 4) > + return rs6000_builtin_decls_x[RS6000_BIF_XVMADDSP]; > + if (VECTOR_UNIT_ALTIVEC_P (V4SFmode) > + && out_mode == SFmode && out_n == 4 > + && in_mode == SFmode && in_n == 4) > + return rs6000_builtin_decls_x[RS6000_BIF_VMADDFP]; > + break; > +CASE_CFN_TRUNC: > + if (VECTOR_UNIT_VSX_P (V2DFmode) > + && out_mode == DFmode && out_n == 2 > + && in_mode == DFmode && in_n == 2) > + return rs6000_builtin_decls_x[RS6000_BIF_XVRDP
Re: [PATCH] rs6000: Add psabi diagnostic for C++ zero-width bit field ABI change (PR102024)
On Tue, 2021-09-21 at 17:35 -0500, Bill Schmidt wrote: > Hi! > > Previously zero-width bit fields were removed from structs, so that otherwise > homogeneous aggregates were treated as such and passed in FPRs and VSRs. > This was incorrect behavior per the ELFv2 ABI. Now that these fields are no > longer being removed, we generate the correct parameter passing code. Alert > the unwary user in the rare cases where this behavior changes. > > As noted in the PR, once the GCC 12 Changes page has text describing this > issue, > we can update the diagnostic message to reference that URL. I'll handle that > in a follow-up patch. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. > Is this okay for trunk? How previously? is this one that will need all the backports? > > Thanks! > Bill > > > 2021-09-21 Bill Schmidt > > gcc/ > PR target/102024 > * config/rs6000/rs6000-call.c (rs6000_aggregate_candidate): Detect > zero-width bit fields and return indicator. > (rs6000_discover_homogeneous_aggregate): Diagnose when the > presence of a zero-width bit field changes parameter passing in > GCC 12. > > gcc/testsuite/ > PR target/102024 > * g++.target/powerpc/pr102024.C: New. ok > --- > gcc/config/rs6000/rs6000-call.c | 39 ++--- > gcc/testsuite/g++.target/powerpc/pr102024.C | 23 > 2 files changed, 57 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/g++.target/powerpc/pr102024.C > > diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c > index 7d485480225..c02b202b0cd 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -6227,7 +6227,7 @@ const struct altivec_builtin_types > altivec_overloaded_builtins[] = { > > static int > rs6000_aggregate_candidate (const_tree type, machine_mode *modep, > - int *empty_base_seen) > + int *empty_base_seen, int *zero_width_bf_seen) > { >machine_mode mode; >HOST_WIDE_INT size; > @@ -6298,7 +6298,8 @@ rs6000_aggregate_candidate (const_tree type, > machine_mode *modep, > return -1; > > count = rs6000_aggregate_candidate (TREE_TYPE (type), modep, > - empty_base_seen); > + empty_base_seen, > + zero_width_bf_seen); > if (count == -1 > || !index > || !TYPE_MAX_VALUE (index) > @@ -6336,6 +6337,12 @@ rs6000_aggregate_candidate (const_tree type, > machine_mode *modep, > if (TREE_CODE (field) != FIELD_DECL) > continue; > > + if (DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD (field)) > + { > + *zero_width_bf_seen = 1; > + continue; > + } > + Noting that the definition comes from tree.h and is #define SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD(NODE, VAL) \ do { \ gcc_checking_assert (DECL_BIT_FIELD (NODE));\ FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_0 = (VAL); \ } while (0) ok. > if (DECL_FIELD_ABI_IGNORED (field)) > { > if (lookup_attribute ("no_unique_address", > @@ -6347,7 +6354,8 @@ rs6000_aggregate_candidate (const_tree type, > machine_mode *modep, > } > > sub_count = rs6000_aggregate_candidate (TREE_TYPE (field), modep, > - empty_base_seen); > + empty_base_seen, > + zero_width_bf_seen); > if (sub_count < 0) > return -1; > count += sub_count; > @@ -6381,7 +6389,8 @@ rs6000_aggregate_candidate (const_tree type, > machine_mode *modep, > continue; > > sub_count = rs6000_aggregate_candidate (TREE_TYPE (field), modep, > - empty_base_seen); > + empty_base_seen, > + zero_width_bf_seen); > if (sub_count < 0) > return -1; > count = count > sub_count ? count : sub_count; > @@ -6423,8 +6432,10 @@ rs6000_discover_homogeneous_aggregate (machine_mode > mode, const_tree type, > { >machine_mode field_mode = VOIDmode; >int empty_base_seen = 0; > + int zero_width_bf_seen = 0; >int field_count = rs6000_aggregate_candidate (type, &field_mode, > - &empty_base_seen); > + &empty_base_seen, > + &zero_width_bf_seen); > That appears to be all of the callers of rs6000_aggregate_candidate. (ok). >if (field_co
Re: PING^4: [RS6000] rotate and mask constants [PR94393]
On Mon, 2021-10-25 at 14:41 -0500, Pat Haugen via Gcc-patches wrote: > Ping. > > On 8/10/21 10:49 AM, Pat Haugen via Gcc-patches wrote: > > On 7/27/21 1:35 PM, will schmidt wrote: > > > On Fri, 2021-07-23 at 15:23 -0500, Pat Haugen via Gcc-patches wrote: > > > > Ping > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html > > > > > > > > I've done a current bootstrap/regtest on powerpc64/powerpc64le with > > > > no regressions. > > > > > > > > -Pat > > > > > > That patch was previously posted by Alan Modra. > > > Given the time lapse this may need to be re-posted entirely, pending > > > what the maintainers suggest.. :-) > > > > > > > > > > Including full patch. A non-authorative re-review. :-) > > > > > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > > index 60f406a4ff6..8e758a1f2dd 100644 > > --- a/gcc/config/rs6000/rs6000.c > > +++ b/gcc/config/rs6000/rs6000.c > > @@ -1131,6 +1131,8 @@ static tree rs6000_handle_altivec_attribute (tree *, > > tree, tree, int, bool *); > > static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool > > *); > > static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree); > > static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT); > > +static bool rotate_and_mask_constant (unsigned HOST_WIDE_INT, > > HOST_WIDE_INT *, > > + int *, unsigned HOST_WIDE_INT *); > > static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool); > > static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, > > bool); > > static int rs6000_debug_address_cost (rtx, machine_mode, addr_space_t, Prototype for rotate_and_mask_constant function. OK. > > @@ -5973,7 +5975,7 @@ num_insns_constant_gpr (HOST_WIDE_INT value) > > } > > > > /* Helper for num_insns_constant. Allow constants formed by the > > - num_insns_constant_gpr sequences, plus li -1, rldicl/rldicr/rlwinm, > > + num_insns_constant_gpr sequences, and li/lis+rldicl/rldicr/rldic/rlwinm, > > and handle modes that require multiple gprs. */ Ok. > > > > static int > > @@ -5988,8 +5990,8 @@ num_insns_constant_multi (HOST_WIDE_INT value, > > machine_mode mode) > >if (insns > 2 > > /* We won't get more than 2 from num_insns_constant_gpr > > except when TARGET_POWERPC64 and mode is DImode or > > -wider, so the register mode must be DImode. */ > > - && rs6000_is_valid_and_mask (GEN_INT (low), DImode)) > > +wider. */ > > + && rotate_and_mask_constant (low, NULL, NULL, NULL)) > > insns = 2; > >total += insns; > >/* If BITS_PER_WORD is the number of bits in HOST_WIDE_INT, doing > > @@ -10077,6 +10079,244 @@ rs6000_emit_set_const (rtx dest, rtx source) > >return true; > > } > > > > +/* Rotate DImode word, being careful to handle the case where > > + HOST_WIDE_INT is larger than DImode. */ > > + > > +static inline unsigned HOST_WIDE_INT > > +rotate_di (unsigned HOST_WIDE_INT x, unsigned int shift) > > +{ > > + unsigned HOST_WIDE_INT mask_hi, mask_lo; > > + > > + mask_hi = (HOST_WIDE_INT_1U << 63 << 1) - (HOST_WIDE_INT_1U << shift); > > + mask_lo = (HOST_WIDE_INT_1U << shift) - 1; > > + x = ((x << shift) & mask_hi) | ((x >> (64 - shift)) & mask_lo); > > + x = (x ^ (HOST_WIDE_INT_1U << 63)) - (HOST_WIDE_INT_1U << 63); > > + return x; > > +} ok. > > + > > +/* Can C be formed by rotating a 16-bit positive value left by C16LSB? */ Perhaps rephrase as "Attempt to form a constant by ... C16LSB." > > + > > +static inline bool > > +is_rotate_positive_constant (unsigned HOST_WIDE_INT c, int c16lsb, > > +HOST_WIDE_INT *val, int *shift, > > +unsigned HOST_WIDE_INT *mask) > > +{ > > + if ((c & ~(HOST_WIDE_INT_UC (0x7fff) << c16lsb)) == 0) > > +{ > > + /* eg. c = 1100 ... > > +-> val = 0x3000, shift = 49, mask = -1ull. */ > > + if (val) > > + { > > + c >>= c16lsb; > > + /* Make the value and shift canonical in the sense of > > +selecting the smallest value. For the example above > > +-> val = 3, shift = 61. */ Comment seems reasonable. I did not review or confirm the maths involved. > > + int trail_zeros = ctz_hwi (c); > > + c >>= trail_zeros; > > + c16lsb += trail_zeros; > > + *val = c; > > + *shift = c16lsb; > > + *mask = HOST_WIDE_INT_M1U; > > + } > > + return true; > > +} > > + return false; > > +} > > + > > +/* Can C be formed by rotating a 16-bit negative value left by C16LSB? */ > > + > > +static inline bool > > +is_rotate_negative_constant (unsigned HOST_WIDE_INT c, int c16lsb, > > +HOST_WIDE_INT *val, int *shift, > > +unsigned HOST_WIDE_INT *mask) > > +{ > > + if ((c | (HOST_WIDE_INT_UC (0x7fff) << c16lsb)) == HOST_WIDE_INT_M1U) > > +{ > > + if (val) > > + { > > + c >>= c16lsb; > > +
Re: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)
On Fri, 2021-11-05 at 00:04 -0400, Michael Meissner wrote: > Add new constant data structure. > > This patch provides the data structure and function to convert a > CONST_INT, CONST_DOUBLE, CONST_VECTOR, or VEC_DUPLICATE of a constant) to > an array of bytes, half-words, words, and double words that can be loaded > into a 128-bit vector register. > > The next patches will use this data structure to generate code that > generates load of the vector/floating point registers using the XXSPLTIDP, > XXSPLTIW, and LXVKQ instructions that were added in power10. > > 2021-11-05 Michael Meissner > Email here is different than the from:. No big deal either way. > gcc/ > > * config/rs6000/rs6000-protos.h (VECTOR_128BIT_*): New macros. I defer to maintainers. I like to explicitly include the full macro names here so a grep later on can easily find it. > (vec_const_128bit_type): New structure type. > (vec_const_128bit_to_bytes): New declaration. > * config/rs6000/rs6000.c (constant_int_to_128bit_vector): New > helper function. > (constant_fp_to_128bit_vector): New helper function. > (vec_const_128bit_to_bytes): New function. ok > --- > gcc/config/rs6000/rs6000-protos.h | 28 > gcc/config/rs6000/rs6000.c| 253 ++ > 2 files changed, 281 insertions(+) > > diff --git a/gcc/config/rs6000/rs6000-protos.h > b/gcc/config/rs6000/rs6000-protos.h > index 14f6b313105..490d6e33736 100644 > --- a/gcc/config/rs6000/rs6000-protos.h > +++ b/gcc/config/rs6000/rs6000-protos.h > @@ -222,6 +222,34 @@ address_is_prefixed (rtx addr, >return (iform == INSN_FORM_PREFIXED_NUMERIC > || iform == INSN_FORM_PCREL_LOCAL); > } > + > +/* Functions and data structures relating to 128-bit constants that are > + converted to byte, half-word, word, and double-word values. All fields > are > + kept in big endian order. We also convert scalar values to 128-bits if > they > + are going to be loaded into vector registers. */ > +#define VECTOR_128BIT_BITS 128 > +#define VECTOR_128BIT_BYTES (128 / 8) > +#define VECTOR_128BIT_HALF_WORDS (128 / 16) > +#define VECTOR_128BIT_WORDS (128 / 32) > +#define VECTOR_128BIT_DOUBLE_WORDS (128 / 64) ok > + > +typedef struct { > + /* Constant as various sized items. */ > + unsigned HOST_WIDE_INT double_words[VECTOR_128BIT_DOUBLE_WORDS]; > + unsigned int words[VECTOR_128BIT_WORDS]; > + unsigned short half_words[VECTOR_128BIT_HALF_WORDS]; > + unsigned char bytes[VECTOR_128BIT_BYTES]; > + > + unsigned original_size;/* Constant size before splat. */ > + bool fp_constant_p;/* Is the constant floating > point? */ > + bool all_double_words_same;/* Are the double words all > equal? */ > + bool all_words_same; /* Are the words all equal? */ > + bool all_half_words_same; /* Are the halft words all equal? */ half > + bool all_bytes_same; /* Are the bytes all equal? */ > +} vec_const_128bit_type; > + ok. > +extern bool vec_const_128bit_to_bytes (rtx, machine_mode, > +vec_const_128bit_type *); > #endif /* RTX_CODE */ > > #ifdef TREE_CODE > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 01affc7a47c..f285022294a 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -28619,6 +28619,259 @@ rs6000_output_addr_vec_elt (FILE *file, int value) >fprintf (file, "\n"); > } > > + > +/* Copy an integer constant to the vector constant structure. */ > + Here and subsequent comments, I'd debate on whether to enhance the comment to be explicit on the structure name being copied to/from. (vec_const_128bit_type is easy to search for, vector or constant or structure are not as unique) > +static void > +constant_int_to_128bit_vector (rtx op, > +machine_mode mode, > +size_t byte_num, > +vec_const_128bit_type *info) > +{ > + unsigned HOST_WIDE_INT uvalue = UINTVAL (op); > + unsigned bitsize = GET_MODE_BITSIZE (mode); > + > + for (int shift = bitsize - 8; shift >= 0; shift -= 8) > +info->bytes[byte_num++] = (uvalue >> shift) & 0xff; > +} I didn't confirm the maths, but looks OK at a glance. > + > +/* Copy an floating point constant to the vector constant structure. */ > + s/an/a/ > +static void > +constant_fp_to_128bit_vector (rtx op, > + machine_mode mode, > + size_t byte_num, > + vec_const_128bit_type *info) > +{ > + unsigned bitsize = GET_MODE_BITSIZE (mode); > + unsigned num_words = bitsize / 32; > + const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op); > + long real_words[VECTOR_128BIT_WORDS]; > + > + /* Make sure we don't overflow the real_words array and that it is > +
Re: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)
On Fri, 2021-11-05 at 00:07 -0400, Michael Meissner wrote: > Add LXVKQ support. > > This patch adds support to generate the LXVKQ instruction to load specific > IEEE-128 floating point constants. > > Compared to the last time I submitted this patch, I modified it so that it > uses the bit pattern of the vector to see if it can generate the LXVKQ > instruction. This means on a little endian Power system, the > following code will generate a LXVKQ 34,16 instruction: > > vector long long foo (void) > { > #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ > return (vector long long) { 0x, 0x8000 }; > #else > return (vector long long) { 0x8000, 0x }; > #endif > } > > because that vector pattern is the same bit pattern as -0.0F128. > > 2021-11-05 Michael Meissner > > gcc/ > > * config/rs6000/constraints.md (eQ): New constraint. > * config/rs6000/predicates.md (easy_fp_constant): Add support for > generating the LXVKQ instruction. > (easy_vector_constant_ieee128): New predicate. > (easy_vector_constant): Add support for generating the LXVKQ > instruction. > * config/rs6000/rs6000-protos.h (constant_generates_lxvkq): New > declaration. > * config/rs6000/rs6000.c (output_vec_const_move): Add support for > generating LXVKQ. > (constant_generates_lxvkq): New function. > * config/rs6000/rs6000.opt (-mieee128-constant): New debug > option. > * config/rs6000/vsx.md (vsx_mov_64bit): Add support for > generating LXVKQ. > (vsx_mov_32bit): Likewise. > * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the > eQ constraint. > > gcc/testsuite/ > > * gcc.target/powerpc/float128-constant.c: New test. > --- > gcc/config/rs6000/constraints.md | 6 + > gcc/config/rs6000/predicates.md | 34 > gcc/config/rs6000/rs6000-protos.h | 1 + > gcc/config/rs6000/rs6000.c| 62 +++ > gcc/config/rs6000/rs6000.opt | 4 + > gcc/config/rs6000/vsx.md | 14 ++ > gcc/doc/md.texi | 4 + > .../gcc.target/powerpc/float128-constant.c| 160 ++ > 8 files changed, 285 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-constant.c > > diff --git a/gcc/config/rs6000/constraints.md > b/gcc/config/rs6000/constraints.md > index c8cff1a3038..e72132b4c28 100644 > --- a/gcc/config/rs6000/constraints.md > +++ b/gcc/config/rs6000/constraints.md > @@ -213,6 +213,12 @@ (define_constraint "eI" >"A signed 34-bit integer constant if prefixed instructions are supported." >(match_operand 0 "cint34_operand")) > > +;; A TF/KF scalar constant or a vector constant that can load certain IEEE > +;; 128-bit constants into vector registers using LXVKQ. > +(define_constraint "eQ" > + "An IEEE 128-bit constant that can be loaded into VSX registers." > + (match_operand 0 "easy_vector_constant_ieee128")) > + > ;; Floating-point constraints. These two are defined so that insn > ;; length attributes can be calculated exactly. > ok > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index 956e42bc514..e0d1c718e9f 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -601,6 +601,14 @@ (define_predicate "easy_fp_constant" >if (TARGET_VSX && op == CONST0_RTX (mode)) > return 1; > > + /* Constants that can be generated with ISA 3.1 instructions are easy. */ Easy is relative, but OK. > + vec_const_128bit_type vsx_const; > + if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const)) > +{ > + if (constant_generates_lxvkq (&vsx_const) != 0) > + return true; > +} > + >/* Otherwise consider floating point constants hard, so that the > constant gets pushed to memory during the early RTL phases. This > has the advantage that double precision constants that can be > @@ -609,6 +617,23 @@ (define_predicate "easy_fp_constant" > return 0; > }) > > +;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded > +;; via the LXVKQ instruction. > + > +(define_predicate "easy_vector_constant_ieee128" > + (match_code "const_vector,const_double") > +{ > + vec_const_128bit_type vsx_const; > + > + /* Can we generate the LXVKQ instruction? */ > + if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10 > + || !TARGET_VSX) > +return false; Presumably all of the checks there are valid. (Can we have power10 without float128_hw or ieee128_constant flags set?)I do notice the addition of an ieee128_constant flag below. > + > + return (vec_const_128bit_to_bytes (op, mode, &vsx_const) > + && constant_generates_lxvkq (&vsx_const) != 0); > +}) > + ok > ;; Return 1 if the operand is a constant that ca
Re: [PATCH 3/5] Add Power10 XXSPLTIW
On Fri, 2021-11-05 at 00:09 -0400, Michael Meissner wrote: > Generate XXSPLTIW on power10. > Hi, > This patch adds support to automatically generate the ISA 3.1 XXSPLTIW > instruction for V8HImode, V4SImode, and V4SFmode vectors. It does this by > adding support for vector constants that can be used, and adding a > VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction. > > The eP constraint was added to recognize constants that can be loaded into > vector registers with a single prefixed instruction. Perhaps Swap "... the eP constraint was added ..." for "Add the eP constraint to ..." > > I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector > constants. > > 2021-11-05 Michael Meissner > > gcc/ > > * config/rs6000/constraints.md (eP): Update comment. > * config/rs6000/predicates.md (easy_fp_constant): Add support for > generating XXSPLTIW. > (vsx_prefixed_constant): New predicate. > (easy_vector_constant): Add support for > generating XXSPLTIW. > * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New > declaration. > (constant_generates_xxspltiw): Likewise. > * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate > XXSPLTIW, don't do XXSPLTIB and sign extend. Perhaps just 'generate XXSPLTIW if possible'. > (output_vec_const_move): Add support for XXSPLTIW. > (prefixed_xxsplti_p): New function. > (constant_generates_xxspltiw): New function. > * config/rs6000/rs6000.md (prefixed attribute): Add support to > mark XXSPLTI* instructions as being prefixed. > * config/rs6000/rs6000.opt (-msplat-word-constant): New debug > switch. > * config/rs6000/vsx.md (vsx_mov_64bit): Add support for > generating XXSPLTIW or XXSPLTIDP. > (vsx_mov_32bit): Likewise. > * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the > eP constraint. > > gcc/testsuite/ > > * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test. > * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test. > * gcc.target/powerpc/vec-splat-constant-v4si.c: New test. > * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test. > * gcc.target/powerpc/vec-splati-runnable.c: Update insn count. > --- > gcc/config/rs6000/constraints.md | 6 ++ > gcc/config/rs6000/predicates.md | 46 ++- > gcc/config/rs6000/rs6000-protos.h | 2 + > gcc/config/rs6000/rs6000.c| 81 +++ > gcc/config/rs6000/rs6000.md | 5 ++ > gcc/config/rs6000/rs6000.opt | 4 + > gcc/config/rs6000/vsx.md | 28 +++ > gcc/doc/md.texi | 4 + > .../powerpc/vec-splat-constant-v16qi.c| 27 +++ > .../powerpc/vec-splat-constant-v4sf.c | 67 +++ > .../powerpc/vec-splat-constant-v4si.c | 51 > .../powerpc/vec-splat-constant-v8hi.c | 62 ++ > .../gcc.target/powerpc/vec-splati-runnable.c | 4 +- > 13 files changed, 369 insertions(+), 18 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c > > diff --git a/gcc/config/rs6000/constraints.md > b/gcc/config/rs6000/constraints.md > index e72132b4c28..a4b05837fa6 100644 > --- a/gcc/config/rs6000/constraints.md > +++ b/gcc/config/rs6000/constraints.md > @@ -213,6 +213,12 @@ (define_constraint "eI" >"A signed 34-bit integer constant if prefixed instructions are supported." >(match_operand 0 "cint34_operand")) > > +;; A SF/DF scalar constant or a vector constant that can be loaded into > vector > +;; registers with one prefixed instruction such as XXSPLTIDP or XXSPLTIW. > +(define_constraint "eP" > + "A constant that can be loaded into a VSX register with one prefixed insn." > + (match_operand 0 "vsx_prefixed_constant")) > + > ;; A TF/KF scalar constant or a vector constant that can load certain IEEE > ;; 128-bit constants into vector registers using LXVKQ. > (define_constraint "eQ" > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index e0d1c718e9f..ed6252bd0c4 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -605,7 +605,10 @@ (define_predicate "easy_fp_constant" >vec_const_128bit_type vsx_const; >if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const)) > { > - if (constant_generates_lxvkq (&vsx_const) != 0) > + if (constant_generates_lxvkq (&vsx_const)) > + return true; > + > + if (constant_generates_xxspltiw (&vsx_const)) > return true; > } > o
Re: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants
On Fri, 2021-11-05 at 00:10 -0400, Michael Meissner wrote: > Generate XXSPLTIDP for vectors on power10. > > This patch implements XXSPLTIDP support for all vector constants. The > XXSPLTIDP instruction is given a 32-bit immediate that is converted to a > vector > of two DFmode constants. The immediate is in SFmode format, so only constants > that fit as SFmode values can be loaded with XXSPLTIDP. > > The constraint (eP) added in the previous patch for XXSPLTIW is also used > for XXSPLTIDP. > ok > DImode scalar constants are not handled. This is due to the majority of > DImode > constants will be in the GPR registers. With vector registers, you have the > problem that XXSPLTIDP splats the double word into both elements of the > vector. However, if TImode is loaded with an integer constant, it wants a > full > 128-bit constant. This may be worth as adding to a todo somewhere in the code. > > SFmode and DFmode scalar constants are not handled in this patch. The > support for for those constants will be in the next patch. ok > > I have added a temporary switch (-msplat-float-constant) to control whether or > not the XXSPLTIDP instruction is generated. > > I added 2 new tests to test loading up V2DI and V2DF vector constants. > > 2021-11-05 Michael Meissner > > gcc/ > > * config/rs6000/predicates.md (easy_fp_constant): Add support for > generating XXSPLTIDP. > (vsx_prefixed_constant): Likewise. > (easy_vector_constant): Likewise. > * config/rs6000/rs6000-protos.h (constant_generates_xxspltidp): > New declaration. > * config/rs6000/rs6000.c (output_vec_const_move): Add support for > generating XXSPLTIDP. > (prefixed_xxsplti_p): Likewise. > (constant_generates_xxspltidp): New function. > * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option. > > gcc/testsuite/ > > * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn > regex for power10. > * gcc.target/powerpc/vec-splat-constant-v2df.c: New test. > * gcc.target/powerpc/vec-splat-constant-v2di.c: New test. > --- ok > gcc/config/rs6000/predicates.md | 9 ++ > gcc/config/rs6000/rs6000-protos.h | 1 + > gcc/config/rs6000/rs6000.c| 108 ++ > gcc/config/rs6000/rs6000.opt | 4 + > .../powerpc/pr86731-fwrapv-longlong.c | 9 +- > .../powerpc/vec-splat-constant-v2df.c | 64 +++ > .../powerpc/vec-splat-constant-v2di.c | 50 > 7 files changed, 241 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c > > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index ed6252bd0c4..d748b11857c 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -610,6 +610,9 @@ (define_predicate "easy_fp_constant" > >if (constant_generates_xxspltiw (&vsx_const)) > return true; > + > + if (constant_generates_xxspltidp (&vsx_const)) > + return true; > } > >/* Otherwise consider floating point constants hard, so that the > @@ -653,6 +656,9 @@ (define_predicate "vsx_prefixed_constant" >if (constant_generates_xxspltiw (&vsx_const)) > return true; > > + if (constant_generates_xxspltidp (&vsx_const)) > +return true; > + >return false; > }) > > @@ -727,6 +733,9 @@ (define_predicate "easy_vector_constant" > > if (constant_generates_xxspltiw (&vsx_const)) > return true; > + > + if (constant_generates_xxspltidp (&vsx_const)) > + return true; > } ok > >if (TARGET_P9_VECTOR > diff --git a/gcc/config/rs6000/rs6000-protos.h > b/gcc/config/rs6000/rs6000-protos.h > index 99c6a671289..2d28df7442d 100644 > --- a/gcc/config/rs6000/rs6000-protos.h > +++ b/gcc/config/rs6000/rs6000-protos.h > @@ -253,6 +253,7 @@ extern bool vec_const_128bit_to_bytes (rtx, machine_mode, > vec_const_128bit_type *); > extern unsigned constant_generates_lxvkq (vec_const_128bit_type *); > extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *); > +extern unsigned constant_generates_xxspltidp (vec_const_128bit_type *); > #endif /* RTX_CODE */ > > #ifdef TREE_CODE > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index be24f56eb31..8fde48cf2b3 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -7012,6 +7012,13 @@ output_vec_const_move (rtx *operands) > operands[2] = GEN_INT (imm); > return "xxspltiw %x0,%2"; > } > + > + imm = constant_generates_xxspltidp (&vsx_const); > + if (imm) Just a nit that the two lines could be combined into a similar form as used elsewhere as ... if (constant_generates_xxsp
Re: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.
On Fri, 2021-11-05 at 00:11 -0400, Michael Meissner wrote: > Generate XXSPLTIDP for scalars on power10. > > This patch implements XXSPLTIDP support for SF, and DF scalar constants. > The previous patch added support for vector constants. This patch adds > the support for SFmode and DFmode scalar constants. > > I added 2 new tests to test loading up SF and DF scalar constants. ok > > 2021-11-05 Michael Meissner > > gcc/ > > * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec. > (UNSPEC_XXSPLTIW_CONST): New unspec. > (movsf_hardfloat): Add support for generating XXSPLTIDP. > (mov_hardfloat32): Likewise. > (mov_hardfloat64): Likewise. > (xxspltidp__internal): New insns. > (xxspltiw__internal): New insns. > (splitters for SF/DFmode): Add new splitters for XXSPLTIDP. > > gcc/testsuite/ > > * gcc.target/powerpc/vec-splat-constant-df.c: New test. > * gcc.target/powerpc/vec-splat-constant-sf.c: New test. > --- ok > gcc/config/rs6000/rs6000.md | 97 +++ > .../powerpc/vec-splat-constant-df.c | 60 > .../powerpc/vec-splat-constant-sf.c | 60 > 3 files changed, 199 insertions(+), 18 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 3a7bcd2426e..4122acb98cf 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -156,6 +156,8 @@ (define_c_enum "unspec" > UNSPEC_PEXTD > UNSPEC_HASHST > UNSPEC_HASHCHK > + UNSPEC_XXSPLTIDP_CONST > + UNSPEC_XXSPLTIW_CONST >]) > > ;; > @@ -7764,17 +7766,17 @@ (define_split > ;; > ;; LWZ LFSLXSSP LXSSPX STFS STXSSP > ;; STXSSPX STWXXLXOR LI FMRXSCPSGNDP > -;; MR MT MF NOP > +;; MR MT MF NOPXXSPLTIDP > > (define_insn "movsf_hardfloat" >[(set (match_operand:SF 0 "nonimmediate_operand" >"=!r, f, v, wa,m, wY, > Z, m, wa, !r,f, wa, > - !r,*c*l, !r, *h") > + !r,*c*l, !r, *h,wa") > (match_operand:SF 1 "input_operand" >"m, m, wY, Z, f, v, > wa,r, j, j, f, wa, > - r, r, *h, 0"))] > + r, r, *h, 0, eP"))] >"(register_operand (operands[0], SFmode) > || register_operand (operands[1], SFmode)) > && TARGET_HARD_FLOAT > @@ -7796,15 +7798,16 @@ (define_insn "movsf_hardfloat" > mr %0,%1 > mt%0 %1 > mf%1 %0 > - nop" > + nop > + #" >[(set_attr "type" > "load, fpload,fpload, fpload,fpstore, fpstore, >fpstore,store, veclogical, integer, fpsimple, fpsimple, > - *, mtjmpr,mfjmpr, *") > + *, mtjmpr,mfjmpr, *, vecperm") > (set_attr "isa" > "*, *, p9v,p8v, *, p9v, >p8v,*, *, *, *, *, > - *, *, *, *")]) > + *, *, *, *, p10")]) > > ;; LWZ LFIWZX STWSTFIWX MTVSRWZMFVSRWZ > ;; FMR MR MT%0 MF%1 NOP > @@ -8064,18 +8067,18 @@ (define_split > > ;; STFD LFD FMR LXSDSTXSD > ;; LXSD STXSD XXLOR XXLXOR GPR<-0 > -;; LWZ STW MR > +;; LWZ STW MR XXSPLTIDP > > > (define_insn "*mov_hardfloat32" >[(set (match_operand:FMOVE64 0 "nonimmediate_operand" > "=m, d, d, , wY, >, Z, , , !r, > - Y, r, !r") > + Y, r, !r, wa") > (match_operand:FMOVE64 1 "input_operand" > "d, m, d, wY, , >Z, , , , , > - r, Y, r"))] > + r, Y, r, eP"))] >"! TARGET_POWERPC64 && TARGET_HARD_FLOAT > && (gpc_reg_operand (operands[0], mode) > || gpc_reg_operand (operands[1], mode))" > @@ -8092,20 +8095,21 @@ (define_insn "*mov_hardfloat32" > # > # > # > + # > #" >[(set_attr "type" > "fpstore, fpload, fpsimple, fpload, fpstore, > fpload, fpstore,veclogical, veclogical, two, > - store, load, two") > +
Re: [PATCH 1/2] Add IEEE 128-bit min/max support on PowerPC.
On Tue, 2021-05-18 at 16:26 -0400, Michael Meissner wrote: > [PATCH 1/2] Add IEEE 128-bit min/max support on PowerPC. > Hi, > This patch adds the support for the IEEE 128-bit floating point C minimum and > maximum instructions. The next patch will add the support for using the > compare and set mask instruction to implement conditional moves. > > This patch does not try to re-use the code used for SF/DF min/max > support. It defines a separate insn for the IEEE 128-bit support. It > uses the code iterator to simplify adding both operations. > > GCC will not convert ?: operations into using min/max instructions provided in I'd throw the ternary term in there, easier to search for later. s/?: operations/ternary (?:) operations / > this patch unless the user uses -Ofast or similar switches due to issues with > NaNs. The next patch that adds conditional move instructions will enable the > ?: conversion in many cases. > > I have done bootstrap builds with this patch on the following 3 systems: > 1)power9 running LE Linux using --with-cpu=power9 > 2)power8 running BE Linux using --with-cpu=power8, testing both > 32/64-bit. > 3)power10 prototype running LE Linux using --with-cpu=power10. > > There were no regressions to the tests, and the new test added passed. Can I > check these patches into trunk branch for GCC 12? > > I would like to check these patches into GCC 11 after a cooling off period, > but > I can also not do the backport if desired. > > gcc/ > 2021-05-18 Michael Meissner > > * config/rs6000/rs6000.c (rs6000_emit_minmax): Add support for ISA > 3.1 IEEE 128-bit floating point xsmaxcqp and xsmincqp > instructions. > * config/rs6000/rs6000.md (s3, IEEE128 iterator): > New insns. ok > > gcc/testsuite/ > 2021-05-18 Michael Meissner > > * gcc.target/powerpc/float128-minmax-2.c: New test. > * gcc.target/powerpc/float128-minmax.c: Turn off power10 code > generation. So, presumably the float128-minmax-2.c test adds/replaces the power10 code gen tests that were removed or disabled from float128-minmax.c. > --- > gcc/config/rs6000/rs6000.c| 3 ++- > gcc/config/rs6000/rs6000.md | 11 +++ > .../gcc.target/powerpc/float128-minmax-2.c| 15 +++ > .../gcc.target/powerpc/float128-minmax.c | 7 +++ > 4 files changed, 35 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 0d05956..fdaf12aeda0 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -16111,7 +16111,8 @@ rs6000_emit_minmax (rtx dest, enum rtx_code code, rtx > op0, rtx op1) >/* VSX/altivec have direct min/max insns. */ >if ((code == SMAX || code == SMIN) >&& (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) > - || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode > + || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode)) > + || (TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode > { >emit_insn (gen_rtx_SET (dest, gen_rtx_fmt_ee (code, mode, op0, op1))); >return; > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 0bfeb24d9e8..3a1bc1f8547 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -5196,6 +5196,17 @@ (define_insn "*s3_vsx" > } >[(set_attr "type" "fp")]) > > +;; Min/max for ISA 3.1 IEEE 128-bit floating point > +(define_insn "s3" > + [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v") > + (fp_minmax:IEEE128 > + (match_operand:IEEE128 1 "altivec_register_operand" "v") > + (match_operand:IEEE128 2 "altivec_register_operand" "v")))] > + "TARGET_POWER10" > + "xscqp %0,%1,%2" > + [(set_attr "type" "vecfloat") > + (set_attr "size" "128")]) > + > ;; The conditional move instructions allow us to perform max and min > operations > ;; even when we don't have the appropriate max/min instruction using the FSEL > ;; instruction. ok > diff --git a/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c > b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c > new file mode 100644 > index 000..c71ba08c9f8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c > @@ -0,0 +1,15 @@ > +/* { dg-require-effective-target ppc_float128_hw } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */ > + > +#ifndef TYPE > +#define TYPE _Float128 > +#endif > + > +/* Test that the fminf128/fmaxf128 functions generate if/then/else and not a > + call. */ > +TYPE f128_min (TYPE a, TYPE b) { return __builtin_fminf128 (a, b); } > +TYPE f128_max (TYPE a, TYPE b) { return __builtin_fmaxf128 (a, b); } > + > +/* { dg-final { scan-assembler {\mxsmaxcqp\M} }
Re: [PATCH 2/2] Add IEEE 128-bit fp conditional move on PowerPC.
On Tue, 2021-05-18 at 16:28 -0400, Michael Meissner wrote: > [PATCH 2/2] Add IEEE 128-bit fp conditional move on PowerPC. > Hi, > This patch adds the support for power10 IEEE 128-bit floating point > conditional > move and for automatically generating min/max. > > In this patch, I simplified things compared to previous patches. Instead of > allowing any four of the modes to be used for the conditional move comparison > and the move itself could use different modes, I restricted the conditional > move to just the same mode. I.e. you can do: ok. > > _Float128 a, b, c, d, e, r; > > r = (a == b) ? c : d; > > But you can't do: > > _Float128 c, d, r; > double a, b; > > r = (a == b) ? c : d; > > or: > > _Float128 a, b; > double c, d, r; > > r = (a == b) ? c : d; > > This eliminates a lot of the complexity of the code, because you don't have to > worry about the sizes being different, and the IEEE 128-bit types being > restricted to Altivec registers, while the SF/DF modes can use any VSX > register. > > I did not modify the existing support that allowed conditional moves where > SFmode operands are compared and DFmode operands are moved (and vice versa). > > I modified the test cases that I added to reflect this change. I have also > fixed the test for not equal to use '!=' instead of '=='. > > I have done bootstrap builds with this patch on the following 3 systems: > 1)power9 running LE Linux using --with-cpu=power9 > 2)power8 running BE Linux using --with-cpu=power8, testing both > 32/64-bit. > 3)power10 prototype running LE Linux using --with-cpu=power10. > > There were no regressions to the tests, and the new test added passed. Can I > check these patches into trunk branch for GCC 12? > > I would like to check these patches into GCC 11 after a cooling off period, > but > I can also not do the backport if desired. > > gcc/ > 2021-05-18 Michael Meissner > > * config/rs6000/rs6000.c (rs6000_maybe_emit_fp_cmove): Add IEEE > 128-bit floating point conditional move support. > (have_compare_and_set_mask): Add IEEE 128-bit floating point > types. > * config/rs6000/rs6000.md (movcc, IEEE128 iterator): New insn. > (movcc_p10, IEEE128 iterator): New insn. > (movcc_invert_p10, IEEE128 iterator): New insn. > (fpmask, IEEE128 iterator): New insn. > (xxsel, IEEE128 iterator): New insn. > > gcc/testsuite/ > 2021-05-18 Michael Meissner > > * gcc.target/powerpc/float128-cmove.c: New test. > * gcc.target/powerpc/float128-minmax-3.c: New test. ok > --- > gcc/config/rs6000/rs6000.c| 38 ++- > gcc/config/rs6000/rs6000.md | 106 ++ > .../gcc.target/powerpc/float128-cmove.c | 58 ++ > .../gcc.target/powerpc/float128-minmax-3.c| 15 +++ > 4 files changed, 215 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-cmove.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index fdaf12aeda0..ef1ebaaee05 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -15706,8 +15706,8 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, > rtx op_false, >return 1; > } > > -/* Possibly emit the xsmaxcdp and xsmincdp instructions to emit a maximum or > - minimum with "C" semantics. > +/* Possibly emit the xsmaxc{dp,qp} and xsminc{dp,qp} instructions to emit a > + maximum or minimum with "C" semantics. > > Unless you use -ffast-math, you can't use these instructions to replace > conditions that implicitly reverse the condition because the comparison > @@ -15783,6 +15783,7 @@ rs6000_maybe_emit_fp_cmove (rtx dest, rtx op, rtx > true_cond, rtx false_cond) >enum rtx_code code = GET_CODE (op); >rtx op0 = XEXP (op, 0); >rtx op1 = XEXP (op, 1); > + machine_mode compare_mode = GET_MODE (op0); >machine_mode result_mode = GET_MODE (dest); >rtx compare_rtx; >rtx cmove_rtx; > @@ -15791,6 +15792,35 @@ rs6000_maybe_emit_fp_cmove (rtx dest, rtx op, rtx > true_cond, rtx false_cond) >if (!can_create_pseudo_p ()) > return 0; > > + /* We allow the comparison to be either SFmode/DFmode and the true/false > + condition to be either SFmode/DFmode. I.e. we allow: > + > + float a, b; > + double c, d, r; > + > + r = (a == b) ? c : d; > + > +and: > + > + double a, b; > + float c, d, r; > + > + r = (a == b) ? c : d; This new comment does not seem to align with the comments in the description, which statee "But you can't do ..." > + > +but we don't allow intermixing the IEEE 128-bit floating point types with > +the 32/64-bit scalar types. > + > +It gets too messy where SFmode/DFmode can use any register and > TFm
Re: [PATCH] Fix long double tests when default long double is not IBM.
On Tue, 2021-05-18 at 16:32 -0400, Michael Meissner wrote: > [PATCH] Fix long double tests when default long double is not IBM. > Hi, > This patch adds 3 more selections to target-supports.exp to see if we can > force > the compiler to use a particular long double format (IEEE 128-bit, IBM > extended > double, 64-bit), and the library support will track the changes for the long > double. This is needed because two of the tests in the test suite use long > double, and they are actually testing IBM extended double. > > This patch also forces the two tests that explicitly require long double > to use the IBM double-double encoding to explicitly run the test. This > requires GLIBC 2.32 or greater in order to do the switch. > > I have run tests on a little endian power9 system with 3 compilers. There > were > no regressions with these patches, and the two tests in the following patches > now work if the default long double is not IBM 128-bit: > > * One compiler used the default IBM 128-bit format; > * One compiler used the IEEE 128-bit format; (and) > * One compiler used 64-bit long doubles. > > I have also tested compilers on a big endian power8 system with a compiler > defaulting to power8 code generation and another with the default cpu > set. There were no regressions. > > Can I check this patch into the master branch? > > I have done bootstrap builds with this patch on the following 4 systems: > 1)power9 running LE Linux using --with-cpu=power9 with long > double == IBM > 2)power9 running LE Linux using --with-cpu=power9 with long > double == IEEE > 3)power8 running BE Linux using --with-cpu=power8, testing both > 32/64-bit. > 4)power10 prototype running LE Linux using --with-cpu=power10. > > There were no regressions to the tests, and the two test cases that previously > failed with I ran the compiler defaulting to long double using IEEE 128-bit > now > passed. Can I check these patches into trunk branch for GCC 12? > > I would like to check these patches into GCC 11 after a cooling off period, > but > I can also not do the backport if desired. > > gcc/testsuite/ > 2021-05-18 Michael Meissner > > PR target/70117 > * gcc.target/powerpc/pr70117.c: Force the long double type to use > the IBM 128-bit format. > * c-c++-common/dfp/convert-bfp-11.c: Force using IBM 128-bit long > double. Remove check for 64-bit long double. > * lib/target-supports.exp > (add_options_for_ppc_long_double_override_ibm128): New function. > (check_effective_target_ppc_long_double_override_ibm128): New > function. > (add_options_for_ppc_long_double_override_ieee128): New function. > (check_effective_target_ppc_long_double_override_ieee128): New > function. > (add_options_for_ppc_long_double_override_64bit): New function. > (check_effective_target_ppc_long_double_override_64bit): New > function. ok. > --- > .../c-c++-common/dfp/convert-bfp-11.c | 18 +-- > gcc/testsuite/gcc.target/powerpc/pr70117.c| 6 +- > gcc/testsuite/lib/target-supports.exp | 107 ++ > 3 files changed, 121 insertions(+), 10 deletions(-) > > diff --git a/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c > b/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c > index 95c433d2c24..35da07d1fa4 100644 > --- a/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c > +++ b/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c > @@ -1,9 +1,14 @@ > -/* { dg-skip-if "" { ! "powerpc*-*-linux*" } } */ > +/* { dg-require-effective-target dfp } */ > +/* { dg-require-effective-target ppc_long_double_override_ibm128 } */ > +/* { dg-add-options ppc_long_double_override_ibm128 } */ > > -/* Test decimal float conversions to and from IBM 128-bit long double. > - Checks are skipped at runtime if long double is not 128 bits. > - Don't force 128-bit long doubles because runtime support depends > - on glibc. */ > +/* We force the long double type to be IBM 128-bit because the > CONVERT_TO_PINF > + tests will fail if we use IEEE 128-bit floating point. This is due to > IEEE > + 128-bit having a larger exponent range than IBM 128-bit extended double. > So > + tests that would generate an infinity with IBM 128-bit will generate a > + normal number with IEEE 128-bit. */ ok > + > +/* Test decimal float conversions to and from IBM 128-bit long double. */ > > #include "convert.h" > > @@ -36,9 +41,6 @@ CONVERT_TO_PINF (312, tf, sd, 1.6e+308L, d32) > int > main () > { > - if (sizeof (long double) != 16) > -return 0; > - >convert_101 (); >convert_102 (); > > diff --git a/gcc/testsuite/gcc.target/powerpc/pr70117.c > b/gcc/testsuite/gcc.target/powerpc/pr70117.c > index 3bbd2c595e0..8a5fad1dee0 100644 > --- a/gcc/testsuite/gcc.target/powerpc/pr70117.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr70117.c > @@ -1,5 +1,7 @@ > -/* { dg-do run { target {
Re: [PATCH] Allow __ibm128 on older PowerPC systems.
On Tue, 2021-05-18 at 16:36 -0400, Michael Meissner wrote: > [PATCH] Allow __ibm128 on older PowerPC systems. > Hi, > On January 8th, 2018, I added code to ibm-ldouble.c to use the built-in > function __builtin_pack_ibm128 if long double is IEEE 128-bit and continue to > use __builtin_pack_longdouble if long double is IBM extended double. This > code > was needed because __builtin_pack_ibm128 is not available unless the __ibm128 > keyword is availabe. In the current code, __ibm128 is only enabled if we have > support for both IBM and IEEE 128-bit long double. "available." May be worth re-sifting the description to drop the history not directly applicable to what this patch is doing. > > Segher suggested that instead I should make __ibm128, __builtin_pack_ibm128, > and __builtin_unpack_ibm128 available on older systems that don't support IEEE > 128-bit floating point but does support the IBM extended double floating > point. > > This patch changes the code so that __ibm128 is now exported if either > long double uses the IBM extended double format, or IEEE 128-bit floating > point is available. > > I changed the internal built-in types from float128 to ibm128, since the > only built-in functions that use this are __builtin_pack_ibm128 and > __builtin_unpack_ibm128, and the new name matches the function. > > In addition, this patch changes the function within libgcc that handles > IBM long double to use the __builtin_pack_ibm128 function. ok > > I have done bootstrap builds with this patch on the following 3 systems: > 1)power9 running LE Linux using --with-cpu=power9 > 2)power8 running BE Linux using --with-cpu=power8, testing both > 32/64-bit. > 3)power10 prototype running LE Linux using --with-cpu=power10. > > There were no regressions to the tests, and the new test added passed. Can I > check these patches into trunk branch for GCC 12? > > At the moment, I'm not sure this should be backported to GCC 11. But I can > easily do the back port after a stabilizing period. > > gcc/ > 2021-05-18 Michael Meissner > > * config/rs6000/rs6000-builtin.def (BU_IBM128_2): Rename > RS6000_BTM_IBM128 from RS6000_BTM_FLOAT128. > * config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Update > error message for __ibm128 built-in functions. > (rs6000_init_builtins): Create the __ibm128 keyword on older > systems where long double uses the IBM extended double format, > even if they don't support IEEE 128-bit floating point. Could drop 'older', ok. > * config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Rename > RS6000_BTM_IBM128 from RS6000_BTM_FLOAT128. > (rs6000_builtin_mask_names): Rename RS6000_BTM_IBM128 from > RS6000_BTM_FLOAT128. > * config/rs6000/rs6000.h (TARGET_IBM128): New macro. > (RS6000_BTM_IBM128): Rename from RS6000_BTM_FLOAT128. > (RS6000_BTM_COMMON): Rename RS6000_BTM_IBM128 from > RS6000_BTM_FLOAT128. ok > > libgcc/ > 2021-05-18 Michael Meissner > > * config/rs6000/ibm-ldouble.c (pack_ldouble): Use > __builtin_pack_ibm128 instead of __builtin_pack_longdouble. > --- > gcc/config/rs6000/rs6000-builtin.def | 5 ++--- > gcc/config/rs6000/rs6000-call.c | 14 ++ > gcc/config/rs6000/rs6000.c | 4 ++-- > gcc/config/rs6000/rs6000.h | 12 +--- > libgcc/config/rs6000/ibm-ldouble.c | 4 ++-- > 5 files changed, 25 insertions(+), 14 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtin.def > b/gcc/config/rs6000/rs6000-builtin.def > index 609bebdfd74..6d82ed224fb 100644 > --- a/gcc/config/rs6000/rs6000-builtin.def > +++ b/gcc/config/rs6000/rs6000-builtin.def > @@ -796,13 +796,12 @@ >| RS6000_BTC_BINARY), \ > CODE_FOR_ ## ICODE) /* ICODE */ > > -/* 128-bit __ibm128 floating point builtins (use -mfloat128 to indicate that > - __ibm128 is available). */ > +/* 128-bit __ibm128 floating point builtins. */ > #define BU_IBM128_2(ENUM, NAME, ATTR, ICODE) \ >RS6000_BUILTIN_2 (MISC_BUILTIN_ ## ENUM, /* ENUM */ \ > "__builtin_" NAME, /* NAME */ \ > (RS6000_BTM_HARD_FLOAT /* MASK */ \ > - | RS6000_BTM_FLOAT128),\ > + | RS6000_BTM_IBM128), \ > (RS6000_BTC_ ## ATTR/* ATTR */ \ >| RS6000_BTC_BINARY), \ > CODE_FOR_ ## ICODE) /* ICODE */ ok > diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c > index c4332a61862..7bdc4eeca5f 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -11540,8 +11540,8 @@ rs6000_invalid_buil
Re: [PATCH] Change rs6000_const_f32_to_i32 return type.
On Tue, 2021-05-18 at 16:39 -0400, Michael Meissner wrote: > [PATCH] Change rs6000_const_f32_to_i32 return type. > > The function rs6000_const_f32_to_i32 called REAL_VALUE_TO_TARGET_SINGLE > with a long long type and returns it. This patch changes the type to long > which is the proper type for REAL_VALUE_TO_TARGET_SINGLE. ok That seems consistent with the tm.texi blurb: For @code{REAL_VALUE_TO_TARGET_SINGLE} and @code{REAL_VALUE_TO_TARGET_DECIMAL32}, this variable should be a simple @code{long int}. > > I have done bootstraps on little endian power9 and big endian power8 systems. > Can I check this into the trunk? > > This does not need to go into GCC 11, unless some of the other patches that > use > this function are also back ported. > > gcc/ > 2021-05-18 Michael Meissner > > * config/rs6000/rs6000-protos.h (rs6000_const_f32_to_i32): Change > return type to long. > * config/rs6000/rs6000.c (rs6000_const_f32_to_i32): Change return > type to long. > --- > gcc/config/rs6000/rs6000-protos.h | 2 +- > gcc/config/rs6000/rs6000.c| 6 -- > 2 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-protos.h > b/gcc/config/rs6000/rs6000-protos.h > index bef727e0a64..c407034d58c 100644 > --- a/gcc/config/rs6000/rs6000-protos.h > +++ b/gcc/config/rs6000/rs6000-protos.h > @@ -282,7 +282,7 @@ extern void rs6000_asm_output_dwarf_pcrel (FILE *file, > int size, > const char *label); > extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size, >const char *label); > -extern long long rs6000_const_f32_to_i32 (rtx operand); > +extern long rs6000_const_f32_to_i32 (rtx operand); > > /* Declare functions in rs6000-c.c */ ok > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 86f53297cb9..ef1ebaaee05 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -27937,10 +27937,12 @@ rs6000_invalid_conversion (const_tree fromtype, > const_tree totype) >return NULL; > } > > -long long > +/* Convert a SFmode constant to the integer bit pattern. */ > + > +long > rs6000_const_f32_to_i32 (rtx operand) > { > - long long value; > + long value; >const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (operand); ok Thanks -Will > >gcc_assert (GET_MODE (operand) == SFmode); > -- > 2.31.1 >
Re: [PATCH 1/2] Move xx* builtins to vsx.md.
On Tue, 2021-05-18 at 16:46 -0400, Michael Meissner wrote: > [PATCH 1/2] Move xx* builtins to vsx.md. > Hi, > I noticed that the xx built-in functions (xxspltiw, xxspltidp, xxsplti32dx, > xxeval, xxblend, and xxpermx) were all defined in altivec.md. However, since > the XX instructions can take both traditional floating point and Altivec > registers, these built-in functions should be in vsx.md. > > This patch just moves the insns from altivec.md to vsx.md. > > I also moved the VM3 mode iterator and VM3_char mode attribute from altivec.md > to vsx.md, since the only use of these were for the XXBLEND insns. > > I have bootstraped this on LE power9 and BE power8 systems. There were no > regressions in the tests. Can I check this into the trunk? > > I do not expect to back port this to GCC 11 unless we will be back porting the > future patches that add support for the XXSPLITW, XXSPLTIDP, and XXSPLTI32DX > instructions. > > gcc/ > 2021-05-18 Michael Meissner > > * config/rs6000/altivec.md (UNSPEC_XXEVAL): Move to vsx.md. > (UNSPEC_XXSPLTIW): Move to vsx.md. > (UNSPEC_XXSPLTID): Move to vsx.md. > (UNSPEC_XXSPLTI32DX): Move to vsx.md. > (UNSPEC_XXBLEND): Move to vsx.md. > (UNSPEC_XXPERMX): Move to vsx.md. > (VM3): Move to vsx.md. > (VM3_char): Move to vsx.md. > (xxspltiw_v4si): Move to vsx.md. > (xxspltiw_v4sf): Move to vsx.md. > (xxspltiw_v4sf_inst): Move to vsx.md. > (xxspltidp_v2df): Move to vsx.md. > (xxspltidp_v2df_inst): Move to vsx.md. > (xxsplti32dx_v4si_inst): Move to vsx.md. > (xxsplti32dx_v4sf): Move to vsx.md. > (xxsplti32dx_v4sf_inst): Move to vsx.md. > (xxblend_): Move to vsx.md. > (xxpermx): Move to vsx.md. > (xxpermx_inst): Move to vsx.md. > * config/rs6000/vsx.md (UNSPEC_XXEVAL): Move from altivec.md. > (UNSPEC_XXSPLTIW): Move from altivec.md. > (UNSPEC_XXSPLTID): Move from altivec.md. > (UNSPEC_XXSPLTI32DX): Move from altivec.md. > (UNSPEC_XXBLEND): Move from altivec.md. > (UNSPEC_XXPERMX): Move from altivec.md. > (VM3): Move from altivec.md. > (VM3_char): Move from altivec.md. > (xxspltiw_v4si): Move from altivec.md. > (xxspltiw_v4sf): Move from altivec.md. > (xxspltiw_v4sf_inst): Move from altivec.md. > (xxspltidp_v2df): Move from altivec.md. > (xxspltidp_v2df_inst): Move from altivec.md. > (xxsplti32dx_v4si_inst): Move from altivec.md. > (xxsplti32dx_v4sf): Move from altivec.md. > (xxsplti32dx_v4sf_inst): Move from altivec.md. > (xxblend_): Move from altivec.md. > (xxpermx): Move from altivec.md. > (xxpermx_inst): Move from altivec.md. > --- > gcc/config/rs6000/altivec.md | 196 - > gcc/config/rs6000/vsx.md | 204 +++ > 2 files changed, 204 insertions(+), 196 deletions(-) > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 1351dafbc41..8a9f55c561b 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -171,16 +171,10 @@ (define_c_enum "unspec" > UNSPEC_VPEXTD > UNSPEC_VCLRLB > UNSPEC_VCLRRB > - UNSPEC_XXEVAL > UNSPEC_VSTRIR > UNSPEC_VSTRIL > UNSPEC_SLDB > UNSPEC_SRDB > - UNSPEC_XXSPLTIW > - UNSPEC_XXSPLTID > - UNSPEC_XXSPLTI32DX > - UNSPEC_XXBLEND > - UNSPEC_XXPERMX > ]) > > (define_c_enum "unspecv" > @@ -221,21 +215,6 @@ (define_mode_iterator VM2 [V4SI > (KF "FLOAT128_VECTOR_P (KFmode)") > (TF "FLOAT128_VECTOR_P (TFmode)")]) > > -;; Like VM2, just do char, short, int, long, float and double > -(define_mode_iterator VM3 [V4SI > -V8HI > -V16QI > -V4SF > -V2DF > -V2DI]) > - > -(define_mode_attr VM3_char [(V2DI "d") > -(V4SI "w") > -(V8HI "h") > -(V16QI "b") > -(V2DF "d") > -(V4SF "w")]) > - > ;; Map the Vector convert single precision to double precision for integer > ;; versus floating point > (define_mode_attr VS_sxwsp [(V4SI "sxw") (V4SF "sp")]) > @@ -820,169 +799,6 @@ (define_insn "vsdb_" >"vsdbi %0,%1,%2,%3" >[(set_attr "type" "vecsimple")]) > > -(define_insn "xxspltiw_v4si" > - [(set (match_operand:V4SI 0 "register_operand" "=wa") > - (unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")] > - UNSPEC_XXSPLTIW))] > - "TARGET_POWER10" > - "xxspltiw %x0,%1" > - [(set_attr "type" "vecsimple") > - (set_attr "prefixed" "yes")]) > - > -(define_expand "xxspltiw_v4sf" > - [(set (match_operand:V4SF 0 "register_operand" "=wa") > - (unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")] > - UNSPEC_XXSPLTIW))] > - "TARGET_POWER10"
Re: [PATCH 2/2] Fix xxeval predicates.
On Tue, 2021-05-18 at 16:47 -0400, Michael Meissner wrote: > [PATCH 2/2] Fix xxeval predicates. > > In doing the patch to move the XX* built-in functions from altivec.md to > vsx.md, I noticed that the xxeval built-in function used the > altivec_register_operand predicate. Since it takes vsx registers, this > might force the register allocate to issue a move when it could use a > traditional floating point register. This patch fixes that. allocator ? > > gcc/ > 2021-05-18 Michael Meissner > > * config/rs6000/vsx.md (xxeval): Use register_predicate instead of > altivec_register_predicate. > --- > gcc/config/rs6000/vsx.md | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index a859038d399..15a8c0e22d8 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -6410,9 +6410,9 @@ (define_insn "xxpermx_inst" > ;; XXEVAL built-in function support > (define_insn "xxeval" >[(set (match_operand:V2DI 0 "register_operand" "=wa") > - (unspec:V2DI [(match_operand:V2DI 1 "altivec_register_operand" "wa") > - (match_operand:V2DI 2 "altivec_register_operand" "wa") > - (match_operand:V2DI 3 "altivec_register_operand" "wa") > + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "wa") > + (match_operand:V2DI 2 "register_operand" "wa") > + (match_operand:V2DI 3 "register_operand" "wa") > (match_operand:QI 4 "u8bit_cint_operand" "n")] >UNSPEC_XXEVAL))] > "TARGET_POWER10" > -- ok Thanks, -Will > 2.31.1 >