On Tue, Aug 15, 2023 at 10:44 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Richard Biener <richard.guent...@gmail.com> writes:
> > On Tue, Aug 15, 2023 at 4:44 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>
> >> on 2023/8/14 22:16, Richard Sandiford wrote:
> >> > No, it was more that 219-142=77, so it seems like a lot of lines
> >> > are being duplicated rather than simply being moved.  (Unlike for
> >> > VMAT_LOAD_STORE_LANES, which was even a slight LOC saving, and so
> >> > was a clear improvement.)
> >> >
> >> > So I was just wondering if there was any obvious factoring-out that
> >> > could be done to reduce the duplication.
> >>
> >> ah, thanks for the clarification!
> >>
> >> I think the main duplication are on the loop body beginning and end,
> >> let's take a look at them in details:
> >>
> >> +  if (memory_access_type == VMAT_GATHER_SCATTER)
> >> +    {
> >> +      gcc_assert (alignment_support_scheme == dr_aligned
> >> +                 || alignment_support_scheme == dr_unaligned_supported);
> >> +      gcc_assert (!grouped_load && !slp_perm);
> >> +
> >> +      unsigned int inside_cost = 0, prologue_cost = 0;
> >>
> >> // These above are newly added.
> >>
> >> +      for (j = 0; j < ncopies; j++)
> >> +       {
> >> +         /* 1. Create the vector or array pointer update chain.  */
> >> +         if (j == 0 && !costing_p)
> >> +           {
> >> +             if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> >> +               vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
> >> +                                            slp_node, &gs_info, 
> >> &dataref_ptr,
> >> +                                            &vec_offsets);
> >> +             else
> >> +               dataref_ptr
> >> +                 = vect_create_data_ref_ptr (vinfo, first_stmt_info, 
> >> aggr_type,
> >> +                                             at_loop, offset, &dummy, gsi,
> >> +                                             &ptr_incr, false, bump);
> >> +           }
> >> +         else if (!costing_p)
> >> +           {
> >> +             gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
> >> +             if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> >> +               dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, 
> >> ptr_incr,
> >> +                                              gsi, stmt_info, bump);
> >> +           }
> >>
> >> // These are for dataref_ptr, in the final looop nest we deal with more 
> >> cases
> >> on simd_lane_access_p and diff_first_stmt_info, but don't handle
> >> STMT_VINFO_GATHER_SCATTER_P any more, very few (one case) can be shared 
> >> between,
> >> IMHO factoring out it seems like a overkill.
> >>
> >> +
> >> +         if (mask && !costing_p)
> >> +           vec_mask = vec_masks[j];
> >>
> >> // It's merged out from j == 0 and j != 0
> >>
> >> +
> >> +         gimple *new_stmt = NULL;
> >> +         for (i = 0; i < vec_num; i++)
> >> +           {
> >> +             tree final_mask = NULL_TREE;
> >> +             tree final_len = NULL_TREE;
> >> +             tree bias = NULL_TREE;
> >> +             if (!costing_p)
> >> +               {
> >> +                 if (loop_masks)
> >> +                   final_mask
> >> +                     = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
> >> +                                           vec_num * ncopies, vectype,
> >> +                                           vec_num * j + i);
> >> +                 if (vec_mask)
> >> +                   final_mask = prepare_vec_mask (loop_vinfo, 
> >> mask_vectype,
> >> +                                                  final_mask, vec_mask, 
> >> gsi);
> >> +
> >> +                 if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> >> +                   dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, 
> >> ptr_incr,
> >> +                                                  gsi, stmt_info, bump);
> >> +               }
> >>
> >> // This part is directly copied from the original, the original gets 
> >> updated by
> >> removing && !STMT_VINFO_GATHER_SCATTER_P.  Due to its size, I didn't 
> >> consider
> >> this before, do you prefer me to factor this part out?
> >>
> >> +             if (gs_info.ifn != IFN_LAST)
> >> +               {
> >> ...
> >> +               }
> >> +             else
> >> +               {
> >> +                 /* Emulated gather-scatter.  */
> >> ...
> >>
> >> // This part is just moved from the original.
> >>
> >> +             vec_dest = vect_create_destination_var (scalar_dest, 
> >> vectype);
> >> +             /* DATA_REF is null if we've already built the statement.  */
> >> +             if (data_ref)
> >> +               {
> >> +                 vect_copy_ref_info (data_ref, DR_REF 
> >> (first_dr_info->dr));
> >> +                 new_stmt = gimple_build_assign (vec_dest, data_ref);
> >> +               }
> >> +             new_temp = make_ssa_name (vec_dest, new_stmt);
> >> +             gimple_set_lhs (new_stmt, new_temp);
> >> +             vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, 
> >> gsi);
> >> +
> >> +             /* Store vector loads in the corresponding SLP_NODE.  */
> >> +             if (slp)
> >> +               slp_node->push_vec_def (new_stmt);
> >> +
> >> +         if (!slp && !costing_p)
> >> +           STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> >> +       }
> >> +
> >> +      if (!slp && !costing_p)
> >> +       *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> >>
> >> // This part is some subsequent handlings, it's duplicated from the 
> >> original
> >> but removing some more useless code.  I guess this part is not worthy
> >> being factored out?
> >>
> >> +      if (costing_p)
> >> +       {
> >> +         if (dump_enabled_p ())
> >> +           dump_printf_loc (MSG_NOTE, vect_location,
> >> +                            "vect_model_load_cost: inside_cost = %u, "
> >> +                            "prologue_cost = %u .\n",
> >> +                            inside_cost, prologue_cost);
> >> +       }
> >> +      return true;
> >> +    }
> >>
> >> // Duplicating the dumping, I guess it's unnecessary to be factored out.
> >>
> >> oh, I just noticed that this should be shorten as
> >> "if (costing_p && dump_enabled_p ())" instead, just the same as what's
> >> adopted for VMAT_LOAD_STORE_LANES dumping.
> >
> > Just to mention, the original motivational idea was even though we
> > duplicate some
> > code we make it overall more readable and thus maintainable.
>
> Not sure I necessarily agree with the "thus".  Maybe it tends to be
> true with a good, well-factored API.  But the internal vector APIs
> make it extremely easy to get things wrong.  If we have multiple copies
> of the same operation, the tendency is to fix bugs in the copy that the
> bugs were seen in.  It's easy to forget that other copies exist
> elsewhere that probably need updating in the same way.
>
> > In the end we
> > might have vectorizable_load () for analysis but have not only
> > load_vec_info_type but one for each VMAT_* which means multiple separate
> > vect_transform_load () functions.  Currently vectorizable_load is structured
> > very inconsistently, having the transforms all hang off a single
> > switch (vmat-kind) {} would be an improvement IMHO.
>
> Yeah, agree vectorizable_load ought to be refactored.
>
> > But sure some of our internal APIs are verbose and maybe badly factored,
> > any improvement there is welcome.  Inventing new random APIs just to
> > save a few lines of code without actually making the code more readable
> > is IMHO bad.
>
> OK, fair enough.  So the idea is: see where we end up and then try to
> improve/factor the APIs in a less peephole way?

Yeah, I think that's the only good way forward.

> Thanks,
> Richard
>
> > But, if we can for example enhance prepare_vec_mask to handle both loop
> > and conditional mask and handle querying the mask that would be fine
> > (of course you need to check all uses to see if that makes sense).
> >
> > Richard.
> >
> >> BR,
> >> Kewen

Reply via email to