https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115868

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
                 CC|                            |jakub at gcc dot gnu.org
   Last reconfirmed|                            |2024-07-11

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
(gdb) l
1321            masks->rgc_vec.safe_grow_cleared (nvectors, true);
1322          rgroup_controls *rgm = &(*masks).rgc_vec[nvectors - 1];
1323          /* The number of scalars per iteration and the number of vectors
are
1324             both compile-time constants.  */
1325          unsigned int nscalars_per_iter
1326              = exact_div (nvectors * TYPE_VECTOR_SUBPARTS (vectype),
1327                           LOOP_VINFO_VECT_FACTOR (loop_vinfo)).to_constant
();
1328
1329          if (rgm->max_nscalars_per_iter < nscalars_per_iter)
1330            {
(gdb) p nvectors
$1 = 1
(gdb) p debug_generic_expr (vectype)
vector(4) double
$2 = void
(gdb) p loop_vinfo->vectorization_factor 
$3 = {coeffs = {8}}

it looks we create an inbranch variant (even if unused) and vectorize that.
The key might be .MASK_CALL support here which gets a bool argument and

/space/rguenther/src/gcc-14-branch/gcc/testsuite/g++.dg/vect/pr68762.h:10:14:
note:   ==> examining statement: _2 = .MASK_CALL (baz, d_1, _19);
/space/rguenther/src/gcc-14-branch/gcc/testsuite/g++.dg/vect/pr68762.h:10:14:
note:   get vectype for scalar type: double
/space/rguenther/src/gcc-14-branch/gcc/testsuite/g++.dg/vect/pr68762.h:10:14:
note:   vectype: vector(4) double
/space/rguenther/src/gcc-14-branch/gcc/testsuite/g++.dg/vect/pr68762.h:10:14:
note:   nunits = 4
/space/rguenther/src/gcc-14-branch/gcc/testsuite/g++.dg/vect/pr68762.h:10:14:
note:   ==> examining statement: _32 = ~_19;
/space/rguenther/src/gcc-14-branch/gcc/testsuite/g++.dg/vect/pr68762.h:10:14:
note:   vectype: vector(8) <signed-boolean:1>
/space/rguenther/src/gcc-14-branch/gcc/testsuite/g++.dg/vect/pr68762.h:10:14:
note:   nunits = 8

something goes wrong with registering loop masks here I think.  I think
that in vectorizable_simd_clone_call when doing

            case SIMD_CLONE_ARG_TYPE_MASK:
              if (loop_vinfo
                  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
                vect_record_loop_mask (loop_vinfo,
                                       &LOOP_VINFO_MASKS (loop_vinfo),
                                       ncopies, vectype, op);

we're not properly considering that a SIMD call can have a larger vector
argument composed of _two_ inputs (though 'vectype' here is for the result).

Note that ncopies is based on the simdlen of the function.  As said we can
at least handle input merging and destination splitting:

  _50 = {vect_d_1.253_41, vect_d_1.254_43};
  _51 = VIEW_CONVERT_EXPR<unsigned char>(mask__19.257_49);
  _52 = (unsigned int) _51;
  _53 = _Z3bazd.simdclone.7 (_50, _52);
  _54 = BIT_FIELD_REF <_53, 256, 0>;
  _55 = BIT_FIELD_REF <_53, 256, 256>;

I do not remember whether we support non-uniform argument/return types
or if the SIMD ABI can require splitting an argument into two.

The following fixes this:

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 21e8fe98e44..73408f2c6d4 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4317,9 +4317,14 @@ vectorizable_simd_clone_call (vec_info *vinfo,
stmt_vec_info stmt_info,
            case SIMD_CLONE_ARG_TYPE_MASK:
              if (loop_vinfo
                  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
-               vect_record_loop_mask (loop_vinfo,
-                                      &LOOP_VINFO_MASKS (loop_vinfo),
-                                      ncopies, vectype, op);
+               {
+                 unsigned mult
+                   = exact_div (bestn->simdclone->simdlen,
+                                TYPE_VECTOR_SUBPARTS (vectype)).to_constant
();
+                 vect_record_loop_mask (loop_vinfo,
+                                        &LOOP_VINFO_MASKS (loop_vinfo),
+                                        ncopies * mult, vectype, op);
+               }

              break;
            }

Reply via email to