Hi Wang Jue,
Thanks, reasonable to have this for riscv as well, after many other targets
adopted the same thing. IMHO, it is long overdue to promote
-munroll-small-loops to -funroll-small-loops...
Out of curiosity (and because the GCC project is going to get an AI policy
soon), would you mind sharing whether or not any LLM assistance was used for
parts of this contribution?
> +/* Implement TARGET_LOOP_UNROLL_ADJUST. */
Could use a somewhat more verbose comment here :)
> +
> +static unsigned
> +riscv_loop_unroll_adjust (unsigned nunroll, class loop *loop)
> +{
> + if (riscv_unroll_only_small_loops)
> + {
> + if (loop->ninsns <= 4)
> + return MIN (8, nunroll);
> + else
> + return 1;
> + }
> + return nunroll;
> +}
How did you end up at 4? Empirically? I could imagine that for riscv a
slightly higher number might make sense (than e.g. on x86).
Also, unrolling 8x might be too much? I'd also be interested in how good the
loop->ninsns number matches up with actual assembly.
Usually, for these kinds of tuning knobs, we'd like at least a SPEC run or
similar to determine good values. Maybe it also turns out that we need
different numbers per uarch (likely)?
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index c2670ad87b2..3226367bf8a 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -454,3 +454,7 @@ Enum(arcv_mpy_option) String(10c)
> Value(ARCV_MPY_OPTION_10C)
> mmpy-option=
> Target RejectNegative Joined Enum(arcv_mpy_option) Var(arcv_mpy_option)
> Init(ARCV_MPY_OPTION_2C)
> The type of MPY unit used by the RMX-100 core (to be used in combination
> with -mtune=arc-v-rmx-100-series) (default: 2c).
> +
> +munroll-only-small-loops
> +Target Var(riscv_unroll_only_small_loops) Init(0) Save
> +Enable conservative small loop unrolling.
I think we need an accompanying change in invoke.texi to document that option.
(Then "make regenerate-opt-urls")
> +/* { dg-final { scan-rtl-dump "Unrolled loop" "loop2_unroll" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/unroll-large-loop.c
> b/gcc/testsuite/gcc.target/riscv/unroll-large-loop.c
> new file mode 100644
> index 00000000000..ad470c34fdd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/unroll-large-loop.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll-details" } */
> +
> +/* Verify that a large loop (4 or more insns in the body) is NOT unrolled
I wouldn't call 4 or more "large" :) but at exactly 4 we should still be
unrolling?
> +++ b/gcc/testsuite/gcc.target/riscv/unroll-small-loop.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll-details" } */
> +
> +/* Verify -munroll-only-small-loops (default ON at -O2) unrolls a small
> + loop. The do-while form uses the counter itself as the induction
> + variable, so the RTL loop body collapses to roughly:
> + asm ; <-- empty volatile asm (kept as one insn)
> + addi n, n, -1
> + bnez n, .L
> + giving loop->ninsns <= 4 and triggering the small-loop unroll path
> + in riscv_loop_unroll_adjust. The empty volatile asm prevents the
> + loop from being deleted as dead code. */
> +
> +void
> +small_loop (int n)
> +{
> + do
> + __asm__ volatile ("");
> + while (--n);
> +}
How is that loop special that we want to test it? And what does the induction
variable have to do with it? Is this just supposed to be the smallest loop we
can write?
--
Regards
Robin