> Am 06.09.2024 um 15:28 schrieb Tamar Christina <tamar.christ...@arm.com>:
> 
> 
>> 
>> -----Original Message-----
>> From: Richard Biener <rguent...@suse.de>
>> Sent: Friday, September 6, 2024 2:09 PM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; j...@ventanamicro.com
>> Subject: Re: [PATCH 2/4]middle-end: lower COND_EXPR into gimple form in
>> vect_recog_bool_pattern
>> 
>>> On Tue, 3 Sep 2024, Tamar Christina wrote:
>>> 
>>> Hi All,
>>> 
>>> Currently the vectorizer cheats when lowering COND_EXPR during bool recog.
>>> In the cases where the conditonal is loop invariant or non-boolean it 
>>> instead
>>> converts the operation back into GENERIC and hides much of the operation 
>>> from
>>> the analysis part of the vectorizer.
>>> 
>>> i.e.
>>> 
>>>  a ? b : c
>>> 
>>> is transformed into:
>>> 
>>>  a != 0 ? b : c
>>> 
>>> however by doing so we can't perform any optimization on the mask as they
>> aren't
>>> explicit until quite late during codegen.
>>> 
>>> To fix this this patch lowers booleans earlier and so ensures that we are 
>>> always
>>> in GIMPLE.
>>> 
>>> For when the value is a loop invariant boolean we have to generate an 
>>> additional
>>> conversion from bool to the integer mask form.
>>> 
>>> This is done by creating a loop invariant a ? -1 : 0 with the target mask
>>> precision and then doing a normal != 0 comparison on that.
>>> 
>>> To support this the patch also adds the ability to during pattern matching
>>> create a loop invariant pattern that won't be seen by the vectorizer and 
>>> will
>>> instead me materialized inside the loop preheader in the case of loops, or 
>>> in
>>> the case of BB vectorization it materializes it in the first BB in the 
>>> region.
>>> 
>>> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
>>> x86_64-pc-linux-gnu -m32, -m64 and no issues.
>>> 
>>> Ok for master?
>> 
>> OK, but can you clarify a question below?
>> 
>>> Thanks,
>>> Tamar
>>> 
>>> gcc/ChangeLog:
>>> 
>>>    * tree-vect-patterns.cc (append_inv_pattern_def_seq): New.
>>>    (vect_recog_bool_pattern): Lower COND_EXPRs.
>>>    * tree-vect-slp.cc (vect_schedule_slp): Materialize loop invariant
>>>    statements.
>>>    * tree-vect-loop.cc (vect_transform_loop): Likewise.
>>>    * tree-vect-stmts.cc (vectorizable_comparison_1): Remove
>>>    VECT_SCALAR_BOOLEAN_TYPE_P handling for vectype.
>>>    * tree-vectorizer.cc (vec_info::vec_info): Initialize
>>>    inv_pattern_def_seq.
>>>    * tree-vectorizer.h (LOOP_VINFO_INV_PATTERN_DEF_SEQ): New.
>>>    (class vec_info): Add inv_pattern_def_seq.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>    * gcc.dg/vect/bb-slp-conditional_store_1.c: New test.
>>>    * gcc.dg/vect/vect-conditional_store_5.c: New test.
>>>    * gcc.dg/vect/vect-conditional_store_6.c: New test.
>>> 
>>> ---
>>> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c
>> b/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c
>>> new file mode 100644
>>> index
>> 0000000000000000000000000000000000000000..650a3bfbfb1dd44afc2d58b
>> be85f75f1d28b9bd0
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c
>>> @@ -0,0 +1,15 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target vect_int } */
>>> +/* { dg-require-effective-target vect_float } */
>>> +
>>> +/* { dg-additional-options "-mavx2" { target avx2 } } */
>>> +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
>>> +
>>> +void foo3 (float *restrict a, int *restrict c)
>>> +{
>>> +#pragma GCC unroll 8
>>> +  for (int i = 0; i < 8; i++)
>>> +    c[i] = a[i] > 1.0;
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "vectorized using SLP" "slp1" } } */
>>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c
>> b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c
>>> new file mode 100644
>>> index
>> 0000000000000000000000000000000000000000..37d60fa76351c13980427
>> 751be4450c14617a9a9
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c
>>> @@ -0,0 +1,28 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target vect_int } */
>>> +/* { dg-require-effective-target vect_masked_store } */
>>> +
>>> +/* { dg-additional-options "-mavx2" { target avx2 } } */
>>> +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
>>> +
>>> +#include <stdbool.h>
>>> +
>>> +void foo3 (float *restrict a, int *restrict b, int *restrict c, int n, int 
>>> stride)
>>> +{
>>> +  if (stride <= 1)
>>> +    return;
>>> +
>>> +  bool ai = a[0];
>>> +
>>> +  for (int i = 0; i < n; i++)
>>> +    {
>>> +      int res = c[i];
>>> +      int t = b[i+stride];
>>> +      if (ai)
>>> +        t = res;
>>> +      c[i] = t;
>>> +    }
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>>> +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" { target 
>>> aarch64-
>> *-* } } } */
>>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_6.c
>> b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_6.c
>>> new file mode 100644
>>> index
>> 0000000000000000000000000000000000000000..5e1aedf3726b073c132bb6
>> 4a9b474592ceb8e9b9
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_6.c
>>> @@ -0,0 +1,24 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target vect_int } */
>>> +/* { dg-require-effective-target vect_masked_store } */
>>> +
>>> +/* { dg-additional-options "-mavx2" { target avx2 } } */
>>> +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
>>> +
>>> +void foo3 (unsigned long long *restrict a, int *restrict b, int *restrict 
>>> c, int n, int
>> stride)
>>> +{
>>> +  if (stride <= 1)
>>> +    return;
>>> +
>>> +  for (int i = 0; i < n; i++)
>>> +    {
>>> +      int res = c[i];
>>> +      int t = b[i+stride];
>>> +      if (a[i])
>>> +        t = res;
>>> +      c[i] = t;
>>> +    }
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>>> +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" { target 
>>> aarch64-
>> *-* } } } */
>>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>>> index
>> 6456220cdc9bb0ba35baf04c555060ea38d13bbc..e4bed1f88435cb6ebad5651a
>> 266a9c74106500c0 100644
>>> --- a/gcc/tree-vect-loop.cc
>>> +++ b/gcc/tree-vect-loop.cc
>>> @@ -12404,6 +12404,18 @@ vect_transform_loop (loop_vec_info loop_vinfo,
>> gimple *loop_vectorized_call)
>>>       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
>>>     }
>>> 
>>> +  /* Generate the loop invariant statements.  */
>>> +  if (!gimple_seq_empty_p (LOOP_VINFO_INV_PATTERN_DEF_SEQ
>> (loop_vinfo)))
>>> +    {
>>> +      if (dump_enabled_p ())
>>> +    dump_printf_loc (MSG_NOTE, vect_location,
>>> +             "------>generating loop invariant statements\n");
>>> +      gimple_stmt_iterator gsi;
>>> +      gsi = gsi_after_labels (loop_preheader_edge (loop)->src);
>>> +      gsi_insert_seq_before (&gsi, LOOP_VINFO_INV_PATTERN_DEF_SEQ
>> (loop_vinfo),
>>> +                 GSI_CONTINUE_LINKING);
>>> +    }
>>> +
>> 
>> So this is one instance ...
>> 
>>>   /* FORNOW: the vectorizer supports only loops which body consist
>>>      of one basic block (header + empty latch). When the vectorizer will
>>>      support more involved loop forms, the order by which the BBs are
>>> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
>>> index
>> 4b112910df357e9f2783f7173b71812085126389..3aae6a066954d9e1e0d4180
>> 3dd43307fa297af67 100644
>>> --- a/gcc/tree-vect-patterns.cc
>>> +++ b/gcc/tree-vect-patterns.cc
>>> @@ -182,6 +182,17 @@ append_pattern_def_seq (vec_info *vinfo,
>>>                      new_stmt);
>>> }
>>> 
>>> +
>>> +/* Add NEW_STMT to VINFO's invariant pattern definition statements.  These
>>> +   statements are not vectorized but are materialized as scalar in the loop
>>> +   preheader.  */
>>> +
>>> +static inline void
>>> +append_inv_pattern_def_seq (vec_info *vinfo, gimple *new_stmt)
>>> +{
>>> +  gimple_seq_add_stmt_without_update (&vinfo->inv_pattern_def_seq,
>> new_stmt);
>>> +}
>>> +
>>> /* The caller wants to perform new operations on vect_external variable
>>>    VAR, so that the result of the operations would also be vect_external.
>>>    Return the edge on which the operations can be performed, if one exists.
>>> @@ -5983,12 +5994,34 @@ vect_recog_bool_pattern (vec_info *vinfo,
>>>    var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
>>>       else if (integer_type_for_mask (var, vinfo))
>>>    return NULL;
>>> +      else if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
>>> +           && !vect_get_internal_def (vinfo, var))
>>> +    {
>>> +      /* If the condition is already a boolean then manually convert it to 
>>> a
>>> +         mask of the given integer type but don't set a vectype.  */
>>> +      tree lhs_ivar = vect_recog_temp_ssa_var (type, NULL);
>>> +      pattern_stmt = gimple_build_assign (lhs_ivar, COND_EXPR, var,
>>> +                          build_all_ones_cst (type),
>>> +                          build_zero_cst (type));
>>> +      append_inv_pattern_def_seq (vinfo, pattern_stmt);
>>> +      var = lhs_ivar;
>>> +    }
>>> +
>>> +      tree lhs_var = vect_recog_temp_ssa_var (boolean_type_node, NULL);
>>> +      pattern_stmt = gimple_build_assign (lhs_var, NE_EXPR, var,
>>> +                      build_zero_cst (TREE_TYPE (var)));
>>> +
>>> +      tree new_vectype = get_mask_type_for_scalar_type (vinfo, TREE_TYPE
>> (var));
>>> +      if (!new_vectype)
>>> +    return NULL;
>>> +
>>> +      new_vectype = truth_type_for (new_vectype);
>>> +      append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt, new_vectype,
>>> +                  TREE_TYPE (var));
>>> 
>>>       lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
>>>       pattern_stmt
>>> -    = gimple_build_assign (lhs, COND_EXPR,
>>> -                   build2 (NE_EXPR, boolean_type_node,
>>> -                       var, build_int_cst (TREE_TYPE (var), 0)),
>>> +    = gimple_build_assign (lhs, COND_EXPR, lhs_var,
>>>                   gimple_assign_rhs2 (last_stmt),
>>>                   gimple_assign_rhs3 (last_stmt));
>>>       *type_out = vectype;
>>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>>> index
>> 43ecd2689701451b706b41d73ba60773af4cf8a5..1ea836ca8fe1d4867504c6da
>> 42a40c22b6638385 100644
>>> --- a/gcc/tree-vect-slp.cc
>>> +++ b/gcc/tree-vect-slp.cc
>>> @@ -10393,4 +10393,23 @@ vect_schedule_slp (vec_info *vinfo, const
>> vec<slp_instance> &slp_instances)
>>>        SLP_TREE_REPRESENTATIVE (root) = NULL;
>>>         }
>>>     }
>>> +
>>> +  /* Generate the invariant statements.  */
>>> +  if (!gimple_seq_empty_p (vinfo->inv_pattern_def_seq))
>>> +    {
>>> +      if (dump_enabled_p ())
>>> +    dump_printf_loc (MSG_NOTE, vect_location,
>>> +             "------>generating invariant statements\n");
>>> +      gimple_stmt_iterator gsi;
>>> +      basic_block bb_inv = vinfo->bbs[0];
>>> +      loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
>>> +      if (loop_vinfo)
>>> +    bb_inv = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo))->src;
>>> +
>>> +      gsi = gsi_after_labels (bb_inv);
>>> +      gsi_insert_seq_before (&gsi, vinfo->inv_pattern_def_seq,
>>> +                 GSI_CONTINUE_LINKING);
>>> +       vinfo->inv_pattern_def_seq = NULL;
>>> +    }
>> 
>> ... and this another.  This one is only required for BB vectorization?
>> In that context it should probably best be done from vect_slp_region
>> before the vect_schedule_slp call or at least guarded with
>> BB vect?
> 
> Yeah indeed,  I was searching for a place to put it only for BB vectorization
> But couldn't find a proper one.  It's clearing the list at the end to prevent
> a double emission.  But indeed it's cleaner to just guard it with a BB vect 
> check.
> I'll try moving it first.
> 
>> 
>> I suspect you ran into this only for BB vectorization from loop
>> vect as that runs on if-converted code?
> 
> Yeah, pure BB vectorization I couldn't trigger, because of the obvious issue 
> that
> for it to work it needs if-cvt.  So I can just assert and reject it,  but the 
> reason
> I kept it was I will need it for the early break BB vectorization which 
> doesn't need
> Ifcvt and it seems trivial that the only place we can insert them is at the 
> top of the
> first BB in the SLP region.
> 
> But would you prefer an assert for now?

No, guard or move it instead

Richard 

> Cheers,
> Tamar
>> 
>> Thanks for clarifying.
>> Richard.
>> 
>>> +
>>> }
>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>>> index
>> 385e63163c2450b1cee9f3c2e5df11636116ec2d..c761360d2e80bbf25f55c00ee
>> 69bbadfcc9128e1 100644
>>> --- a/gcc/tree-vect-stmts.cc
>>> +++ b/gcc/tree-vect-stmts.cc
>>> @@ -12652,11 +12652,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
>> vectype,
>>>   /* Invariant comparison.  */
>>>   if (!vectype)
>>>     {
>>> -      if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
>>> -    vectype = mask_type;
>>> -      else
>>> -    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1),
>>> -                           slp_node);
>>> +      vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1), 
>>> slp_node);
>>>       if (!vectype || maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits))
>>>    return false;
>>>     }
>>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>>> index
>> df6c8ada2f7814ac1ea89913e881dd659bd2da62..d3389767325229e95321223
>> 6c248c30697bbce0d 100644
>>> --- a/gcc/tree-vectorizer.h
>>> +++ b/gcc/tree-vectorizer.h
>>> @@ -509,6 +509,12 @@ public:
>>>   /* The count of the basic blocks in the vectorization region.  */
>>>   unsigned int nbbs;
>>> 
>>> +  /* Used to keep a sequence of def stmts of a pattern stmt that are loop
>>> +    invariant if they exists.
>>> +    The sequence is emitted in the loop preheader should the loop be 
>>> vectorized
>>> +    and are reset when undoing patterns.  */
>>> +  gimple_seq inv_pattern_def_seq;
>>> +
>>> private:
>>>   stmt_vec_info new_stmt_vec_info (gimple *stmt);
>>>   void set_vinfo_for_stmt (gimple *, stmt_vec_info, bool = true);
>>> @@ -1039,6 +1045,7 @@ public:
>>> #define LOOP_VINFO_ORIG_LOOP_INFO(L)       (L)->orig_loop_info
>>> #define LOOP_VINFO_SIMD_IF_COND(L)         (L)->simd_if_cond
>>> #define LOOP_VINFO_INNER_LOOP_COST_FACTOR(L) (L)-
>>> inner_loop_cost_factor
>>> +#define LOOP_VINFO_INV_PATTERN_DEF_SEQ(L)  (L)->inv_pattern_def_seq
>>> 
>>> #define LOOP_VINFO_FULLY_MASKED_P(L)        \
>>>   (LOOP_VINFO_USING_PARTIAL_VECTORS_P (L)    \
>>> diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
>>> index
>> 1fb4fb36ed44d11baea2d3db8eedf9685be344e8..70b83720fe28b8c8c567f68df
>> 1ceb308d49043dc 100644
>>> --- a/gcc/tree-vectorizer.cc
>>> +++ b/gcc/tree-vectorizer.cc
>>> @@ -465,7 +465,8 @@ vec_info::vec_info (vec_info::vec_kind kind_in,
>> vec_info_shared *shared_)
>>>     shared (shared_),
>>>     stmt_vec_info_ro (false),
>>>     bbs (NULL),
>>> -    nbbs (0)
>>> +    nbbs (0),
>>> +    inv_pattern_def_seq (NULL)
>>> {
>>>   stmt_vec_infos.create (50);
>>> }
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> --
>> Richard Biener <rguent...@suse.de>
>> SUSE Software Solutions Germany GmbH,
>> Frankenstrasse 146, 90461 Nuernberg, Germany;
>> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to