Hi Robin, Thanks for the review!
> Out of curiosity ... would you mind sharing whether or not any LLM > assistance was used for parts of this contribution? Yes — an LLM was used primarily to generate test case scaffolding and assist with ChangeLog/comment formatting. The core logic, tuning thresholds, and design decisions are based on manual analysis and SPEC measurements. > Could use a somewhat more verbose comment here :) Done in v2. > How did you end up at 4? Empirically? > Also, unrolling 8x might be too much? Yes, empirically. The threshold (ninsns <= 4) and unroll factor were determined from measurements on a multi-issue in-order RISC-V core (C9501), yielding ~2% geomean improvement on SPEC CPU2017 INT. I agree these values are likely uarch-dependent. In v2, I've made them configurable per micro-architecture via riscv_tune_param, so each -mtune= target can specify its own small_unroll_ninsns and small_unroll_factor. > I think we need an accompanying change in invoke.texi Done in v2. > I wouldn't call 4 or more "large" :) but at exactly 4 we should > still be unrolling? Fixed the wording. Yes, loops with ninsns <= 4 are unrolled. > How is that loop special that we want to test it? Nothing special — it's just the minimal loop to verify the unroll path triggers. Simplified the comment and removed the asm in v2, using a simple accumulation loop instead. v2 will follow shortly. Best regards, Wang Jue
