[ Returning to an old discussion... ]


On 8/19/24 3:50 AM, Richard Sandiford wrote:
Jeff Law <jeffreya...@gmail.com> writes:
On 8/12/24 3:50 PM, Jeff Law wrote:


On 8/12/24 1:49 PM, Richard Sandiford wrote:

-      regno = subreg_regno (x);
+      /* A paradoxical should always be REGNO (y) + 0.  Using
subreg_regno
+         for something like (subreg:DI (reg:SI N) 0) on a
WORDS_BIG_ENDIAN
+         target will return N-1 which is catastrophic for N == 0 and
just
+         wrong for other cases.
+
+         Fixing subreg_regno would be a better option, except that
reload
+         depends on its current behavior.  */
+      if (paradoxical_subreg_p (x))
+        regno = REGNO (y);
+      else
+        regno = subreg_regno (x);

Are you sure that's right?  For a 32-bit big-endian target,
(subreg:DI (reg:SI 1) 0) really should simplify to (reg:DI 0) rather
than (reg:DI 1).
Correct, we want to get (reg:DI 0).  We get "0" back from REGNO (y). And
we get 0 back from byte_lowpart_offset (remember, it's paradoxical).
   The sum is 0 resulting in (reg:DI 0).
So rewinding this discussion a bit.

Focusing on this insn:

(insn 77 75 80 6 (parallel [
             (set (reg:DI 75 [ _32 ])
                 (plus:DI (reg:DI 73 [ _31 ])
                     (subreg:DI (reg/v:SI 41 [ __n ]) 0)))
             (clobber (scratch:SI))
         ]) "j.C":50:38 discrim 1 155 {adddi3}
      (expr_list:REG_DEAD (reg:DI 73 [ _31 ])
         (expr_list:REG_DEAD (reg/v:SI 41 [ __n ])
             (nil))))

Not surprisingly we're focused on the subreg expression in there.

The first checkpoint in my mind is IRA's allocation where we assign it
to reg 0.


       Popping a0(r41,l0)  --         assign reg 0


So given the use inside a paradoxical subreg, do we consider this valid?

After the discussion from last week, I'm leaning a bit more towards no
than before.

I thought it wasn't valid.  AIUI, there are two mechanisms that try
to prevent it:

- valid_mode_changes_for_regno, which says which hard registers can
   form all subregs required by a pseudo.  This is only used to restrict
   class choices though, rather than forbid individual registers.

- This code in ira_build_conflicts:

          /* Now we deal with paradoxical subreg cases where certain registers
             cannot be accessed in the widest mode.  */
          machine_mode outer_mode = ALLOCNO_WMODE (a);
          machine_mode inner_mode = ALLOCNO_MODE (a);
          if (paradoxical_subreg_p (outer_mode, inner_mode))
            {
              enum reg_class aclass = ALLOCNO_CLASS (a);
              for (int j = ira_class_hard_regs_num[aclass] - 1; j >= 0; --j)
                {
                   int inner_regno = ira_class_hard_regs[aclass][j];
                   int outer_regno = simplify_subreg_regno (inner_regno,
                                                            inner_mode, 0,
                                                            outer_mode);
                   if (outer_regno < 0
                       || !in_hard_reg_set_p (reg_class_contents[aclass],
                                              outer_mode, outer_regno))
                     {
                       SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj),
                                         inner_regno);
                       SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj),
                                         inner_regno);
                     }
                }
            }

   which operates at the level of individual registers.

So yeah, I think the first question is why ira_build_conflicts isn't
kicking in for this register or (if it is) why we still get register 0.
So pulling on this thread leads me into the code that sets up ALLOCNO_WMODE in create_insn_allocnos:

          if ((a = ira_curr_regno_allocno_map[regno]) == NULL)
            {
              a = ira_create_allocno (regno, false, ira_curr_loop_tree_node);
              if (outer != NULL && GET_CODE (outer) == SUBREG)
                {
                  machine_mode wmode = GET_MODE (outer);
                  if (partial_subreg_p (ALLOCNO_WMODE (a), wmode))
                    ALLOCNO_WMODE (a) = wmode;
                }
            }
Note how we only set ALLOCNO_MODE only at allocno creation, so it'll work as intended if and only if the first reference is via a SUBREG.

ISTM the fix here is to always do the check and set ALLOCNO_WMODE.

The other bug I see is that we may potentially have paradoxicals in different modes. ie, on a 32 bit target, we could in theory have a paradoxical in DI and another in TI. So in addition to pulling that code out of the conditional so that it executes every time, the assignment would look like

if (partial_subreg_p (ALLCONO_WMODE (a), wmode)
    && wmode > ALLOCNO_WMODE (a))
  ALLOCNO_WMODE (a) = wmode;

Or something along those lines.

And it all makes sense that you caught this. You and another colleague at ARM were trying to address this exact problem ~11 years ago ;-)

Thoughts?

jeff



Reply via email to