On Tue, Aug 15, 2023 at 10:44 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Richard Biener <richard.guent...@gmail.com> writes: > > On Tue, Aug 15, 2023 at 4:44 AM Kewen.Lin <li...@linux.ibm.com> wrote: > >> > >> on 2023/8/14 22:16, Richard Sandiford wrote: > >> > No, it was more that 219-142=77, so it seems like a lot of lines > >> > are being duplicated rather than simply being moved. (Unlike for > >> > VMAT_LOAD_STORE_LANES, which was even a slight LOC saving, and so > >> > was a clear improvement.) > >> > > >> > So I was just wondering if there was any obvious factoring-out that > >> > could be done to reduce the duplication. > >> > >> ah, thanks for the clarification! > >> > >> I think the main duplication are on the loop body beginning and end, > >> let's take a look at them in details: > >> > >> + if (memory_access_type == VMAT_GATHER_SCATTER) > >> + { > >> + gcc_assert (alignment_support_scheme == dr_aligned > >> + || alignment_support_scheme == dr_unaligned_supported); > >> + gcc_assert (!grouped_load && !slp_perm); > >> + > >> + unsigned int inside_cost = 0, prologue_cost = 0; > >> > >> // These above are newly added. > >> > >> + for (j = 0; j < ncopies; j++) > >> + { > >> + /* 1. Create the vector or array pointer update chain. */ > >> + if (j == 0 && !costing_p) > >> + { > >> + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > >> + vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, > >> + slp_node, &gs_info, > >> &dataref_ptr, > >> + &vec_offsets); > >> + else > >> + dataref_ptr > >> + = vect_create_data_ref_ptr (vinfo, first_stmt_info, > >> aggr_type, > >> + at_loop, offset, &dummy, gsi, > >> + &ptr_incr, false, bump); > >> + } > >> + else if (!costing_p) > >> + { > >> + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); > >> + if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > >> + dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, > >> ptr_incr, > >> + gsi, stmt_info, bump); > >> + } > >> > >> // These are for dataref_ptr, in the final looop nest we deal with more > >> cases > >> on simd_lane_access_p and diff_first_stmt_info, but don't handle > >> STMT_VINFO_GATHER_SCATTER_P any more, very few (one case) can be shared > >> between, > >> IMHO factoring out it seems like a overkill. > >> > >> + > >> + if (mask && !costing_p) > >> + vec_mask = vec_masks[j]; > >> > >> // It's merged out from j == 0 and j != 0 > >> > >> + > >> + gimple *new_stmt = NULL; > >> + for (i = 0; i < vec_num; i++) > >> + { > >> + tree final_mask = NULL_TREE; > >> + tree final_len = NULL_TREE; > >> + tree bias = NULL_TREE; > >> + if (!costing_p) > >> + { > >> + if (loop_masks) > >> + final_mask > >> + = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, > >> + vec_num * ncopies, vectype, > >> + vec_num * j + i); > >> + if (vec_mask) > >> + final_mask = prepare_vec_mask (loop_vinfo, > >> mask_vectype, > >> + final_mask, vec_mask, > >> gsi); > >> + > >> + if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > >> + dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, > >> ptr_incr, > >> + gsi, stmt_info, bump); > >> + } > >> > >> // This part is directly copied from the original, the original gets > >> updated by > >> removing && !STMT_VINFO_GATHER_SCATTER_P. Due to its size, I didn't > >> consider > >> this before, do you prefer me to factor this part out? > >> > >> + if (gs_info.ifn != IFN_LAST) > >> + { > >> ... > >> + } > >> + else > >> + { > >> + /* Emulated gather-scatter. */ > >> ... > >> > >> // This part is just moved from the original. > >> > >> + vec_dest = vect_create_destination_var (scalar_dest, > >> vectype); > >> + /* DATA_REF is null if we've already built the statement. */ > >> + if (data_ref) > >> + { > >> + vect_copy_ref_info (data_ref, DR_REF > >> (first_dr_info->dr)); > >> + new_stmt = gimple_build_assign (vec_dest, data_ref); > >> + } > >> + new_temp = make_ssa_name (vec_dest, new_stmt); > >> + gimple_set_lhs (new_stmt, new_temp); > >> + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, > >> gsi); > >> + > >> + /* Store vector loads in the corresponding SLP_NODE. */ > >> + if (slp) > >> + slp_node->push_vec_def (new_stmt); > >> + > >> + if (!slp && !costing_p) > >> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); > >> + } > >> + > >> + if (!slp && !costing_p) > >> + *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; > >> > >> // This part is some subsequent handlings, it's duplicated from the > >> original > >> but removing some more useless code. I guess this part is not worthy > >> being factored out? > >> > >> + if (costing_p) > >> + { > >> + if (dump_enabled_p ()) > >> + dump_printf_loc (MSG_NOTE, vect_location, > >> + "vect_model_load_cost: inside_cost = %u, " > >> + "prologue_cost = %u .\n", > >> + inside_cost, prologue_cost); > >> + } > >> + return true; > >> + } > >> > >> // Duplicating the dumping, I guess it's unnecessary to be factored out. > >> > >> oh, I just noticed that this should be shorten as > >> "if (costing_p && dump_enabled_p ())" instead, just the same as what's > >> adopted for VMAT_LOAD_STORE_LANES dumping. > > > > Just to mention, the original motivational idea was even though we > > duplicate some > > code we make it overall more readable and thus maintainable. > > Not sure I necessarily agree with the "thus". Maybe it tends to be > true with a good, well-factored API. But the internal vector APIs > make it extremely easy to get things wrong. If we have multiple copies > of the same operation, the tendency is to fix bugs in the copy that the > bugs were seen in. It's easy to forget that other copies exist > elsewhere that probably need updating in the same way. > > > In the end we > > might have vectorizable_load () for analysis but have not only > > load_vec_info_type but one for each VMAT_* which means multiple separate > > vect_transform_load () functions. Currently vectorizable_load is structured > > very inconsistently, having the transforms all hang off a single > > switch (vmat-kind) {} would be an improvement IMHO. > > Yeah, agree vectorizable_load ought to be refactored. > > > But sure some of our internal APIs are verbose and maybe badly factored, > > any improvement there is welcome. Inventing new random APIs just to > > save a few lines of code without actually making the code more readable > > is IMHO bad. > > OK, fair enough. So the idea is: see where we end up and then try to > improve/factor the APIs in a less peephole way?
Yeah, I think that's the only good way forward. > Thanks, > Richard > > > But, if we can for example enhance prepare_vec_mask to handle both loop > > and conditional mask and handle querying the mask that would be fine > > (of course you need to check all uses to see if that makes sense). > > > > Richard. > > > >> BR, > >> Kewen