Hi Juzhe, > +/* Get STORE value. */ > +static tree > +get_store_value (gimple *stmt) > +{ > + if (is_gimple_call (stmt) && gimple_call_internal_p (stmt)) > + { > + if (gimple_call_internal_fn (stmt) == IFN_MASK_STORE) > + return gimple_call_arg (stmt, 3); > + else > + gcc_unreachable (); > + } > + else > + return gimple_assign_rhs1 (stmt); > +}
This was something I was about to mention in the review of v1 that I already started. This is better now. > + basic_block bb = e->src; Rename to pred or so? And then keep the original bb. > if (!live_range) > - continue; > + { > + if (single_succ_p (e->src)) > + { > + /* > + <bb 7> [local count: 850510900]: > + goto <bb 3>; [100.00%] > + > + Handle this case, we should extend live range of bb 3. > + */ /* If the predecessor is an extended basic block extend it and look for DEF's definition again. */ > + bb = single_succ (e->src); > + if (!program_points_per_bb.get (bb)) > + continue; > + live_ranges = live_ranges_per_bb.get (bb); > + max_point > + = (*program_points_per_bb.get (bb)).length () - 1; > + live_range = live_ranges->get (def); > + if (!live_range) > + continue; > + } > + else > + continue; > + } We're approaching a complexity where reverse postorder would have helped ;) Maybe split out the live range into a separate function get_live_range_for_bb ()? > + for (si = gsi_start_bb (bbs[i]); !gsi_end_p (si); gsi_next (&si)) > + { > + if (!(is_gimple_assign (gsi_stmt (si)) > + || is_gimple_call (gsi_stmt (si)))) > + continue; > + stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si)); > + enum stmt_vec_info_type type > + = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info)); > + if ((type == load_vec_info_type || type == store_vec_info_type) > + && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info))) > + { > + /* For non-adjacent load/store STMT, we will potentially > + convert it into: > + > + 1. MASK_LEN_GATHER_LOAD (..., perm indice). > + 2. Continguous load/store + VEC_PERM (..., perm indice) > + > + We will be likely using one more vector variable. */ > + unsigned int max_point > + = (*program_points_per_bb.get (bbs[i])).length () - 1; > + auto *live_ranges = live_ranges_per_bb.get (bbs[i]); > + bool existed_p = false; > + tree var = type == load_vec_info_type > + ? gimple_get_lhs (gsi_stmt (si)) > + : get_store_value (gsi_stmt (si)); > + tree sel_type = build_nonstandard_integer_type ( > + TYPE_PRECISION (TREE_TYPE (var)), 1); > + tree sel = build_decl (UNKNOWN_LOCATION, VAR_DECL, > + get_identifier ("vect_perm"), sel_type); > + pair &live_range = live_ranges->get_or_insert (sel, &existed_p); > + gcc_assert (!existed_p); > + live_range = pair (0, max_point); > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "Add perm indice %T, start = 0, end = %d\n", > + sel, max_point); > + } > + } > } > } So we're inserting a dummy vect_perm element (that's live from the start?). Would it make sense to instead increase the number of needed registers for a load/store and handle this similarly to compute_nregs_for_mode? Maybe also do it directly in compute_local_live_ranges and extend live_range by an nregs? Regards Robin