Hi Juzhe,

> +/* Get STORE value.  */
> +static tree
> +get_store_value (gimple *stmt)
> +{
> +  if (is_gimple_call (stmt) && gimple_call_internal_p (stmt))
> +    {
> +      if (gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
> +     return gimple_call_arg (stmt, 3);
> +      else
> +     gcc_unreachable ();
> +    }
> +  else
> +    return gimple_assign_rhs1 (stmt);
> +}

This was something I was about to mention in the review of v1
that I already started.  This is better now.

> +           basic_block bb = e->src;

Rename to pred or so?  And then keep the original bb.

>             if (!live_range)
> -             continue;
> +             {
> +               if (single_succ_p (e->src))
> +                 {
> +                   /*
> +                      <bb 7> [local count: 850510900]:
> +                      goto <bb 3>; [100.00%]
> +
> +                      Handle this case, we should extend live range of bb 3.
> +                   */

/* If the predecessor is an extended basic block extend it and look for
   DEF's definition again.  */

> +                   bb = single_succ (e->src);
> +                   if (!program_points_per_bb.get (bb))
> +                     continue;
> +                   live_ranges = live_ranges_per_bb.get (bb);
> +                   max_point
> +                     = (*program_points_per_bb.get (bb)).length () - 1;
> +                   live_range = live_ranges->get (def);
> +                   if (!live_range)
> +                     continue;
> +                 }
> +               else
> +                 continue;
> +             }

We're approaching a complexity where reverse postorder would have
helped ;)  Maybe split out the live range into a separate function
get_live_range_for_bb ()?

> +      for (si = gsi_start_bb (bbs[i]); !gsi_end_p (si); gsi_next (&si))
> +     {
> +       if (!(is_gimple_assign (gsi_stmt (si))
> +             || is_gimple_call (gsi_stmt (si))))
> +         continue;
> +       stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
> +       enum stmt_vec_info_type type
> +         = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
> +       if ((type == load_vec_info_type || type == store_vec_info_type)
> +           && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
> +         {
> +           /* For non-adjacent load/store STMT, we will potentially
> +              convert it into:
> +
> +                1. MASK_LEN_GATHER_LOAD (..., perm indice).
> +                2. Continguous load/store + VEC_PERM (..., perm indice)
> +
> +             We will be likely using one more vector variable.  */
> +           unsigned int max_point
> +             = (*program_points_per_bb.get (bbs[i])).length () - 1;
> +           auto *live_ranges = live_ranges_per_bb.get (bbs[i]);
> +           bool existed_p = false;
> +           tree var = type == load_vec_info_type
> +                        ? gimple_get_lhs (gsi_stmt (si))
> +                        : get_store_value (gsi_stmt (si));
> +           tree sel_type = build_nonstandard_integer_type (
> +             TYPE_PRECISION (TREE_TYPE (var)), 1);
> +           tree sel = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +                                  get_identifier ("vect_perm"), sel_type);
> +           pair &live_range = live_ranges->get_or_insert (sel, &existed_p);
> +           gcc_assert (!existed_p);
> +           live_range = pair (0, max_point);
> +           if (dump_enabled_p ())
> +             dump_printf_loc (MSG_NOTE, vect_location,
> +                              "Add perm indice %T, start = 0, end = %d\n",
> +                              sel, max_point);
> +         }
> +     }
>      }
>  }

So we're inserting a dummy vect_perm element (that's live from the start?).
Would it make sense to instead increase the number of needed registers for
a load/store and handle this similarly to compute_nregs_for_mode?
Maybe also do it directly in compute_local_live_ranges and extend live_range
by an nregs?

Regards
 Robin

Reply via email to