https://gcc.gnu.org/g:e2274cdb525545acc30055e7a420ab0b3308d34c

commit e2274cdb525545acc30055e7a420ab0b3308d34c
Author: Jeff Law <j...@ventanamicro.com>
Date:   Wed Oct 30 07:43:22 2024 -0600

    [RISC-V] Aggressively hoist VXRM assignments
    
    So a while back I was looking at pixel_avg for RISC-V where we try to
    use vaaddu for the halfword-ceiling-average step.  The problem with
    vaaddu is that you must set VXRM to a suitable rounding mode as it has
    an undetermined state at function entry or after a function call.
    
    It turns out some designs will fully flush their pipelines on a write to
    VXRM which you can imagine is incredibly expensive.
    
    VXRM assignments are handled by an LCM based algorithm to find "optimal"
    placement points based on what insns in the stream need VXRM assignments
    and the particular mode they need.
    
    Unfortunately in pixel_avg an LCM algorithm only allows hoisting out of
    the innermost loop, but not the outer loop.  The core issue is that LCM
    does not allow any speculation and there are paths which would bypass
    the inner loop (which don't actually trigger at runtime IIRC).
    
    The expectation is that VXRM assignments should be exceedingly rare and
    needing more than one mode even rarer.  So hoisting more aggressively
    seems like a reasonable thing to do, but we don't want to burn too much
    time trying to do something fancy.
    
    So what this patch does is scan the IL once collecting any VXRM needs.
    If the current function has precisely one VXRM mode needed, then we
    pretend (for the sake of LCM) that the first instruction in the function
    also has that need.
    
    By doing so the VXRM assignment is essentially anticipated everywhere in
    the function.  The standard LCM algorithm is run and has enough
    information to hoist the VXRM assignment more aggressively, most often
    to the prologue.
    
    This helps the BPI in a measurable way (IIRC it was 2-3%).  It probably
    helps some of the SiFive designs, but I've been told they still benefit
    from the longer sequence of shifts & adds, hoisting just isn't enough
    for those designs.  The Ventana design basically doesn't care where the
    VXRM assignment is.  Point is we may want to have a tuning knob for the
    patterns which need VXRM (vaadd[u], vasub[u]) at some point in the near
    future.
    
    Bootstrapped and regression tested on riscv64 and regression tested on
    riscv32-elf and riscv64-elf.  We've been using this internally for a
    while a while on spec as well.   Obviously I'll wait for the pre-commit
    tester to do its thing.
    
    gcc/
            * config/riscv/riscv.cc (singleton_vxrm_need): New function.
            (riscv_mode_needed): See if there is a singleton need and if so,
            claim it happens on the first insn in the chain.
    
    (cherry picked from commit a65e1487cda969e4763ae84577bf3e0d9e2b34aa)

Diff:
---
 gcc/config/riscv/riscv.cc | 69 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f0d274653146..912ae3ac0129 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11806,6 +11806,65 @@ riscv_frm_mode_needed (rtx_insn *cur_insn, int code)
   return mode;
 }
 
+/* If the current function needs a single VXRM mode, return it.  Else
+   return VXRM_MODE_NONE.
+
+   This is called on the first insn in the chain and scans the full function
+   once to collect VXRM mode settings.  If a single mode is needed, it will
+   often be better to set it once at the start of the function rather than
+   at an anticipation point.  */
+static int
+singleton_vxrm_need (void)
+{
+  /* Only needed for vector code.  */
+  if (!TARGET_VECTOR)
+    return VXRM_MODE_NONE;
+
+  /* If ENTRY has more than once successor, then don't optimize, just to
+     keep things simple.  */
+  if (EDGE_COUNT (ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs) > 1)
+    return VXRM_MODE_NONE;
+
+  /* Walk the IL noting if VXRM is needed and if there's more than one
+     mode needed.  */
+  bool found = false;
+  int saved_vxrm_mode;
+  for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
+    {
+      if (!INSN_P (insn) || DEBUG_INSN_P (insn))
+       continue;
+
+      int code = recog_memoized (insn);
+      if (code < 0)
+       continue;
+
+      int vxrm_mode = get_attr_vxrm_mode (insn);
+      if (vxrm_mode == VXRM_MODE_NONE)
+       continue;
+
+      /* If this is the first VXRM need, note it.  */
+      if (!found)
+       {
+         saved_vxrm_mode = vxrm_mode;
+         found = true;
+         continue;
+       }
+
+      /* Not the first VXRM need.  If this is different than
+        the saved need, then we're not going to be able to
+        optimize and we can stop scanning now.  */
+      if (saved_vxrm_mode != vxrm_mode)
+       return VXRM_MODE_NONE;
+
+      /* Same mode as we've seen, keep scanning.  */
+    }
+
+  /* If we got here we scanned the whole function.  If we found
+     some VXRM state, then we can optimize.  If we didn't find
+     VXRM state, then there's nothing to optimize.  */
+  return found ? saved_vxrm_mode : VXRM_MODE_NONE;
+}
+
 /* Return mode that entity must be switched into
    prior to the execution of insn.  */
 
@@ -11817,6 +11876,16 @@ riscv_mode_needed (int entity, rtx_insn *insn, 
HARD_REG_SET)
   switch (entity)
     {
     case RISCV_VXRM:
+      /* If CUR_INSN is the first insn in the function, then determine if we
+        want to signal a need in ENTRY->succs to allow for aggressive
+        elimination of subsequent sets of VXRM.  */
+      if (insn == get_first_nonnote_insn ())
+       {
+         int need = singleton_vxrm_need ();
+         if (need != VXRM_MODE_NONE)
+           return need;
+       }
+
       return code >= 0 ? get_attr_vxrm_mode (insn) : VXRM_MODE_NONE;
     case RISCV_FRM:
       return riscv_frm_mode_needed (insn, code);

Reply via email to