Hi Robin,

Thanks for the review!

> Out of curiosity ... would you mind sharing whether or not any LLM
> assistance was used for parts of this contribution?

Yes — an LLM was used primarily to generate test case scaffolding
and assist with ChangeLog/comment formatting.  The core logic,
tuning thresholds, and design decisions are based on manual
analysis and SPEC measurements.

> Could use a somewhat more verbose comment here :)

Done in v2.

> How did you end up at 4?  Empirically?
> Also, unrolling 8x might be too much?

Yes, empirically.  The threshold (ninsns <= 4) and unroll factor
were determined from measurements on a multi-issue in-order RISC-V
core (C9501), yielding ~2% geomean improvement on SPEC CPU2017 INT.

I agree these values are likely uarch-dependent.  In v2, I've made
them configurable per micro-architecture via riscv_tune_param, so
each -mtune= target can specify its own small_unroll_ninsns and
small_unroll_factor.

> I think we need an accompanying change in invoke.texi

Done in v2.

> I wouldn't call 4 or more "large" :) but at exactly 4 we should
> still be unrolling?

Fixed the wording.  Yes, loops with ninsns <= 4 are unrolled.

> How is that loop special that we want to test it?

Nothing special — it's just the minimal loop to verify the unroll
path triggers.  Simplified the comment and removed the asm in v2,
using a simple accumulation loop instead.

v2 will follow shortly.

Best regards,
Wang Jue


Reply via email to