Re: [PATCH] Extend vect_recog_bool_pattern also to stores into bool memory (PR tree-optimization/50596)

Richard Guenther Mon, 24 Oct 2011 07:10:20 -0700

On Mon, 24 Oct 2011, Richard Guenther wrote:

> On Thu, 20 Oct 2011, Jakub Jelinek wrote:
> 
> > On Thu, Oct 20, 2011 at 11:42:01AM +0200, Richard Guenther wrote:
> > > > +  if (TREE_CODE (scalar_dest) == VIEW_CONVERT_EXPR
> > > > +      && is_pattern_stmt_p (stmt_info))
> > > > +    scalar_dest = TREE_OPERAND (scalar_dest, 0);
> > > >    if (TREE_CODE (scalar_dest) != ARRAY_REF
> > > >        && TREE_CODE (scalar_dest) != INDIRECT_REF
> > > >        && TREE_CODE (scalar_dest) != COMPONENT_REF
> > > 
> > > Just change the if () stmt to
> > > 
> > >  if (!handled_component_p (scalar_dest)
> > >      && TREE_CODE (scalar_dest) != MEM_REF)
> > >    return false;
> > 
> > That will accept BIT_FIELD_REF and ARRAY_RANGE_REF (as well as VCE outside 
> > of pattern stmts).
> > The VCEs I hope don't appear, but the first two might, and I'm not sure
> > we are prepared to handle them.  Certainly not BIT_FIELD_REFs.
> > 
> > > > +      rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, 
> > > > stmts);
> > > > +      if (TREE_CODE (lhs) == MEM_REF || TREE_CODE (lhs) == 
> > > > TARGET_MEM_REF)
> > > > +       {
> > > > +         lhs = copy_node (lhs);
> > > 
> > > We don't handle TARGET_MEM_REF in vectorizable_store, so no need to
> > > do it here.  In fact, just unconditionally do ...
> > > 
> > > > +         TREE_TYPE (lhs) = TREE_TYPE (vectype);
> > > > +       }
> > > > +      else
> > > > +       lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs);
> > > 
> > > ... this (wrap it in a V_C_E).  No need to special-case any
> > > MEM_REFs.
> > 
> > Ok.  After all it seems vectorizable_store pretty much ignores it
> > (except for the scalar_dest check above).  For aliasing it uses the type
> > from DR_REF and otherwise it uses the vectorized type.
> > 
> > > > +      if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE 
> > > > (rhs)))
> > > 
> > > This should never be false, so you can as well unconditionally build
> > > the conversion stmt.
> > 
> > You mean because currently adjust_bool_pattern will prefer signed types
> > over unsigned while here lhs will be unsigned?  I guess I should
> > change it to use signed type for the memory store too to avoid the extra
> > cast instead.  Both types can be certainly the same precision, e.g. for:
> > unsigned char a[N], b[N];
> > unsigned int d[N], e[N];
> > bool c[N];
> > ...
> >   for (i = 0; i < N; ++i)
> >     c[i] = a[i] < b[i];
> > or different precision, e.g. for:
> >   for (i = 0; i < N; ++i)
> >     c[i] = d[i] < e[i];
> > 
> > > > @@ -347,6 +347,28 @@ vect_determine_vectorization_factor (loo
> > > >               gcc_assert (STMT_VINFO_DATA_REF (stmt_info)
> > > >                           || is_pattern_stmt_p (stmt_info));
> > > >               vectype = STMT_VINFO_VECTYPE (stmt_info);
> > > > +             if (STMT_VINFO_DATA_REF (stmt_info))
> > > > +               {
> > > > +                 struct data_reference *dr = STMT_VINFO_DATA_REF 
> > > > (stmt_info);
> > > > +                 tree scalar_type = TREE_TYPE (DR_REF (dr));
> > > > +                 /* vect_analyze_data_refs will allow bool writes 
> > > > through,
> > > > +                    in order to allow vect_recog_bool_pattern to 
> > > > transform
> > > > +                    those.  If they couldn't be transformed, give up 
> > > > now.  */
> > > > +                 if (((TYPE_PRECISION (scalar_type) == 1
> > > > +                       && TYPE_UNSIGNED (scalar_type))
> > > > +                      || TREE_CODE (scalar_type) == BOOLEAN_TYPE)
> > > 
> > > Shouldn't it be always possible to vectorize those?  For loads
> > > we can assume the memory contains only 1 or 0 (we assume that for
> > > scalar loads), for stores we can mask out all other bits explicitly
> > > if you add support for truncating conversions to non-mode precision
> > > (in fact, we could support non-mode precision vectorization that way,
> > > if not support bitfield loads or extending conversions).
> > 
> > Not without the pattern recognizer transforming it into something.
> > That is something we've discussed on IRC before I started working on the
> > first vect_recog_bool_pattern patch, we'd need to special case bool and
> > one-bit precision types in way too many places all around the vectorizer.
> > Another reason for that was that what vect_recog_bool_pattern does currently
> > is certainly way faster than what would we end up with if we just handled
> > bool as unsigned (or signed?) char with masking on casts and stores
> > - the ability to use any integer type for the bools rather than char
> > as appropriate means we can avoid many VEC_PACK_TRUNK_EXPRs and
> > corresponding VEC_UNPACK_{LO,HI}_EXPRs.
> > So the chosen solution was attempt to transform some of bool patterns
> > into something the vectorizer can handle easily.
> > And that can be extended over time what it handles.
> > 
> > The above just reflects it, probably just me trying to be too cautious,
> > the vectorization would likely fail on the stmt feeding the store, because
> > get_vectype_for_scalar_type would fail on it.
> > 
> > If we wanted to support general TYPE_PRECISION != GET_MODE_BITSIZE 
> > (TYPE_MODE)
> > vectorization (hopefully with still preserving the pattern bool recognizer
> > for the above stated reasons), we'd start with changing
> > get_vectype_for_scalar_type to handle those types (then the
> > tree-vect-data-refs.c and tree-vect-loop.c changes from this patch would
> > be unnecessary), but then we'd need to handle it in other places too
> > (I guess loads would be fine (unless BIT_FIELD_REF loads), but then
> > casts and stores need extra code).
> 
> This is what I have right now, bootstrapped and tested on 
> x86_64-unknown-linux-gnu.  I do see
> 
> FAIL: gfortran.dg/logical_dot_product.f90  -O3 -fomit-frame-pointer  
> (internal c
> ompiler error)
> FAIL: gfortran.dg/mapping_1.f90  -O3 -fomit-frame-pointer  (internal 
> compiler er
> ror)
> FAIL: gfortran.fortran-torture/execute/pr43390.f90,  -O3 -g  (internal 
> compiler 
> error)
> 
> so there is some fallout, but somebody broke dejagnu enough that
> I can't easily debug this right now, so I'm post-poning it until
> that is fixed.
> 
> It doesn't seem to break any testcases for Bool vectorization.


This one bootstraps and regtests fine on x86_64-unknown-linux-gnu.
I didn't find a good pattern to split out, eventually how we call
the vectorizable_* routines should be re-factored a bit.

Does this look ok to you?

Thanks,
Richard.

2011-10-24  Richard Guenther  <rguent...@suse.de>

        * tree-vect-stmts.c (vect_get_vec_def_for_operand): Convert constants
        to vector element type.
        (vectorizable_assignment): Bail out for non-mode-precision operations.
        (vectorizable_shift): Likewise.
        (vectorizable_operation): Likewise.
        (vectorizable_type_demotion): Likewise.
        (vectorizable_type_promotion): Likewise.
        (vectorizable_store): Handle non-mode-precision stores.
        (vectorizable_load): Handle non-mode-precision loads.
        (get_vectype_for_scalar_type_and_size): Return a vector type
        for non-mode-precision integers.
        * tree-vect-loop.c (vectorizable_reduction): Bail out for
        non-mode-precision reductions.

        * gcc.dg/vect/vect-bool-1.c: New testcase.

Index: gcc/tree-vect-stmts.c
===================================================================
*** gcc/tree-vect-stmts.c       (revision 180380)
--- gcc/tree-vect-stmts.c       (working copy)
*************** vect_get_vec_def_for_operand (tree op, g
*** 1204,1210 ****
          if (vect_print_dump_info (REPORT_DETAILS))
            fprintf (vect_dump, "Create vector_cst. nunits = %d", nunits);
  
!         vec_cst = build_vector_from_val (vector_type, op);
          return vect_init_vector (stmt, vec_cst, vector_type, NULL);
        }
  
--- 1204,1212 ----
          if (vect_print_dump_info (REPORT_DETAILS))
            fprintf (vect_dump, "Create vector_cst. nunits = %d", nunits);
  
!         vec_cst = build_vector_from_val (vector_type,
!                                        fold_convert (TREE_TYPE (vector_type),
!                                                      op));
          return vect_init_vector (stmt, vec_cst, vector_type, NULL);
        }
  
*************** vectorizable_assignment (gimple stmt, gi
*** 2173,2178 ****
--- 2175,2199 ----
              != GET_MODE_SIZE (TYPE_MODE (vectype_in)))))
      return false;
  
+   /* We do not handle bit-precision changes.  */
+   if ((CONVERT_EXPR_CODE_P (code)
+        || code == VIEW_CONVERT_EXPR)
+       && INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
+       && ((TYPE_PRECISION (TREE_TYPE (scalar_dest))
+          != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (scalar_dest))))
+         || ((TYPE_PRECISION (TREE_TYPE (op))
+              != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op))))))
+       /* But a conversion that does not change the bit-pattern is ok.  */
+       && !((TYPE_PRECISION (TREE_TYPE (scalar_dest))
+           > TYPE_PRECISION (TREE_TYPE (op)))
+          && TYPE_UNSIGNED (TREE_TYPE (op))))
+     {
+       if (vect_print_dump_info (REPORT_DETAILS))
+         fprintf (vect_dump, "type conversion to/from bit-precision "
+                "unsupported.");
+       return false;
+     }
+ 
    if (!vec_stmt) /* transformation not required.  */
      {
        STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
*************** vectorizable_shift (gimple stmt, gimple_
*** 2326,2331 ****
--- 2347,2359 ----
  
    scalar_dest = gimple_assign_lhs (stmt);
    vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+   if (TYPE_PRECISION (TREE_TYPE (scalar_dest))
+       != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (scalar_dest))))
+     {
+       if (vect_print_dump_info (REPORT_DETAILS))
+         fprintf (vect_dump, "bit-precision shifts not supported.");
+       return false;
+     }
  
    op0 = gimple_assign_rhs1 (stmt);
    if (!vect_is_simple_use_1 (op0, loop_vinfo, bb_vinfo,
*************** vectorizable_operation (gimple stmt, gim
*** 2660,2665 ****
--- 2688,2708 ----
    scalar_dest = gimple_assign_lhs (stmt);
    vectype_out = STMT_VINFO_VECTYPE (stmt_info);
  
+   /* Most operations cannot handle bit-precision types without extra
+      truncations.  */
+   if ((TYPE_PRECISION (TREE_TYPE (scalar_dest))
+        != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (scalar_dest))))
+       /* Exception are bitwise operations.  */
+       && code != BIT_IOR_EXPR
+       && code != BIT_XOR_EXPR
+       && code != BIT_AND_EXPR
+       && code != BIT_NOT_EXPR)
+     {
+       if (vect_print_dump_info (REPORT_DETAILS))
+         fprintf (vect_dump, "bit-precision arithmetic not supported.");
+       return false;
+     }
+ 
    op0 = gimple_assign_rhs1 (stmt);
    if (!vect_is_simple_use_1 (op0, loop_vinfo, bb_vinfo,
                             &def_stmt, &def, &dt[0], &vectype))
*************** vectorizable_type_demotion (gimple stmt,
*** 3082,3090 ****
    if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
          && INTEGRAL_TYPE_P (TREE_TYPE (op0)))
         || (SCALAR_FLOAT_TYPE_P (TREE_TYPE (scalar_dest))
!            && SCALAR_FLOAT_TYPE_P (TREE_TYPE (op0))
!            && CONVERT_EXPR_CODE_P (code))))
      return false;
    if (!vect_is_simple_use_1 (op0, loop_vinfo, bb_vinfo,
                             &def_stmt, &def, &dt[0], &vectype_in))
      {
--- 3125,3144 ----
    if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
          && INTEGRAL_TYPE_P (TREE_TYPE (op0)))
         || (SCALAR_FLOAT_TYPE_P (TREE_TYPE (scalar_dest))
!            && SCALAR_FLOAT_TYPE_P (TREE_TYPE (op0)))))
      return false;
+ 
+   if (INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
+       && ((TYPE_PRECISION (TREE_TYPE (scalar_dest))
+          != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (scalar_dest))))
+         || ((TYPE_PRECISION (TREE_TYPE (op0))
+              != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op0)))))))
+     {
+       if (vect_print_dump_info (REPORT_DETAILS))
+         fprintf (vect_dump, "type demotion to/from bit-precision 
unsupported.");
+       return false;
+     }
+ 
    if (!vect_is_simple_use_1 (op0, loop_vinfo, bb_vinfo,
                             &def_stmt, &def, &dt[0], &vectype_in))
      {
*************** vectorizable_type_promotion (gimple stmt
*** 3365,3370 ****
--- 3419,3437 ----
             && SCALAR_FLOAT_TYPE_P (TREE_TYPE (op0))
             && CONVERT_EXPR_CODE_P (code))))
      return false;
+ 
+   if (INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
+       && ((TYPE_PRECISION (TREE_TYPE (scalar_dest))
+          != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (scalar_dest))))
+         || ((TYPE_PRECISION (TREE_TYPE (op0))
+              != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op0)))))))
+     {
+       if (vect_print_dump_info (REPORT_DETAILS))
+         fprintf (vect_dump, "type promotion to/from bit-precision "
+                "unsupported.");
+       return false;
+     }
+ 
    if (!vect_is_simple_use_1 (op0, loop_vinfo, bb_vinfo,
                             &def_stmt, &def, &dt[0], &vectype_in))
      {
*************** vectorizable_store (gimple stmt, gimple_
*** 3673,3689 ****
        return false;
      }
  
-   /* The scalar rhs type needs to be trivially convertible to the vector
-      component type.  This should always be the case.  */
    elem_type = TREE_TYPE (vectype);
-   if (!useless_type_conversion_p (elem_type, TREE_TYPE (op)))
-     {
-       if (vect_print_dump_info (REPORT_DETAILS))
-         fprintf (vect_dump, "???  operands of different types");
-       return false;
-     }
- 
    vec_mode = TYPE_MODE (vectype);
    /* FORNOW. In some cases can vectorize even if data-type not supported
       (e.g. - array initialization with 0).  */
    if (optab_handler (mov_optab, vec_mode) == CODE_FOR_nothing)
--- 3740,3748 ----
        return false;
      }
  
    elem_type = TREE_TYPE (vectype);
    vec_mode = TYPE_MODE (vectype);
+ 
    /* FORNOW. In some cases can vectorize even if data-type not supported
       (e.g. - array initialization with 0).  */
    if (optab_handler (mov_optab, vec_mode) == CODE_FOR_nothing)
*************** vectorizable_load (gimple stmt, gimple_s
*** 4117,4123 ****
    bool strided_load = false;
    bool load_lanes_p = false;
    gimple first_stmt;
-   tree scalar_type;
    bool inv_p;
    bool negative;
    bool compute_in_loop = false;
--- 4176,4181 ----
*************** vectorizable_load (gimple stmt, gimple_s
*** 4192,4198 ****
        return false;
      }
  
!   scalar_type = TREE_TYPE (DR_REF (dr));
    mode = TYPE_MODE (vectype);
  
    /* FORNOW. In some cases can vectorize even if data-type not supported
--- 4250,4256 ----
        return false;
      }
  
!   elem_type = TREE_TYPE (vectype);
    mode = TYPE_MODE (vectype);
  
    /* FORNOW. In some cases can vectorize even if data-type not supported
*************** vectorizable_load (gimple stmt, gimple_s
*** 4204,4219 ****
        return false;
      }
  
-   /* The vector component type needs to be trivially convertible to the
-      scalar lhs.  This should always be the case.  */
-   elem_type = TREE_TYPE (vectype);
-   if (!useless_type_conversion_p (TREE_TYPE (scalar_dest), elem_type))
-     {
-       if (vect_print_dump_info (REPORT_DETAILS))
-         fprintf (vect_dump, "???  operands of different types");
-       return false;
-     }
- 
    /* Check if the load is a part of an interleaving chain.  */
    if (STMT_VINFO_STRIDED_ACCESS (stmt_info))
      {
--- 4262,4267 ----
*************** vectorizable_load (gimple stmt, gimple_s
*** 4560,4566 ****
                    msq = new_temp;
  
                    bump = size_binop (MULT_EXPR, vs_minus_1,
!                                      TYPE_SIZE_UNIT (scalar_type));
                    ptr = bump_vector_ptr (dataref_ptr, NULL, gsi, stmt, bump);
                    new_stmt = gimple_build_assign_with_ops
                                 (BIT_AND_EXPR, NULL_TREE, ptr,
--- 4608,4614 ----
                    msq = new_temp;
  
                    bump = size_binop (MULT_EXPR, vs_minus_1,
!                                      TYPE_SIZE_UNIT (elem_type));
                    ptr = bump_vector_ptr (dataref_ptr, NULL, gsi, stmt, bump);
                    new_stmt = gimple_build_assign_with_ops
                                 (BIT_AND_EXPR, NULL_TREE, ptr,
*************** get_vectype_for_scalar_type_and_size (tr
*** 5441,5453 ****
    if (nbytes < TYPE_ALIGN_UNIT (scalar_type))
      return NULL_TREE;
  
!   /* If we'd build a vector type of elements whose mode precision doesn't
!      match their types precision we'll get mismatched types on vector
!      extracts via BIT_FIELD_REFs.  This effectively means we disable
!      vectorization of bool and/or enum types in some languages.  */
    if (INTEGRAL_TYPE_P (scalar_type)
        && GET_MODE_BITSIZE (inner_mode) != TYPE_PRECISION (scalar_type))
!     return NULL_TREE;
  
    if (GET_MODE_CLASS (inner_mode) != MODE_INT
        && GET_MODE_CLASS (inner_mode) != MODE_FLOAT)
--- 5489,5502 ----
    if (nbytes < TYPE_ALIGN_UNIT (scalar_type))
      return NULL_TREE;
  
!   /* For vector types of elements whose mode precision doesn't
!      match their types precision we use a element type of mode
!      precision.  The vectorization routines will have to make sure
!      they support the proper result truncation/extension.  */
    if (INTEGRAL_TYPE_P (scalar_type)
        && GET_MODE_BITSIZE (inner_mode) != TYPE_PRECISION (scalar_type))
!     scalar_type = build_nonstandard_integer_type (GET_MODE_BITSIZE 
(inner_mode),
!                                                 TYPE_UNSIGNED (scalar_type));
  
    if (GET_MODE_CLASS (inner_mode) != MODE_INT
        && GET_MODE_CLASS (inner_mode) != MODE_FLOAT)
Index: gcc/tree-vect-loop.c
===================================================================
*** gcc/tree-vect-loop.c        (revision 180380)
--- gcc/tree-vect-loop.c        (working copy)
*************** vectorizable_reduction (gimple stmt, gim
*** 4422,4427 ****
--- 4422,4432 ----
        && !SCALAR_FLOAT_TYPE_P (scalar_type))
      return false;
  
+   /* Do not try to vectorize bit-precision reductions.  */
+   if ((TYPE_PRECISION (scalar_type)
+        != GET_MODE_PRECISION (TYPE_MODE (scalar_type))))
+     return false;
+ 
    /* All uses but the last are expected to be defined in the loop.
       The last use is the reduction variable.  In case of nested cycle this
       assumption is not true: we use reduc_index to record the index of the
Index: gcc/testsuite/gcc.dg/vect/vect-bool-1.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/vect-bool-1.c     (revision 0)
--- gcc/testsuite/gcc.dg/vect/vect-bool-1.c     (revision 0)
***************
*** 0 ****
--- 1,15 ----
+ /* { dg-do compile } */
+ /* { dg-require-effective-target vect_int } */
+ 
+ _Bool a[1024];
+ _Bool b[1024];
+ _Bool c[1024];
+ void foo (void)
+ {
+   unsigned i;
+   for (i = 0; i < 1024; ++i)
+     a[i] = b[i] | c[i];
+ }
+ 
+ /* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */

Re: [PATCH] Extend vect_recog_bool_pattern also to stores into bool memory (PR tree-optimization/50596)

Reply via email to