https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Jan Hubicka from comment #17) > We already have > /* This function adjusts the unroll factor based on > the hardware capabilities. For ex, bdver3 has > a loop buffer which makes unrolling of smaller > loops less important. This function decides the > unroll factor using number of memory references > (value 32 is used) as a heuristic. */ > > static unsigned > ix86_loop_unroll_adjust (unsigned nunroll, struct loop *loop) > > which triggers with TARGET_ADJUST_UNROLL > /* X86_TUNE_ADJUST_UNROLL: This enables adjusting the unroll factor based > > on hardware capabilities. Bdver3 hardware has a loop buffer which makes > > unrolling small loop less important. For, such architectures we adjust > > the unroll factor so that the unrolled loop fits the loop buffer. */ > > DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | > m_BDVER4) > > so perhaps what you propose can be done by making this one more general? Hmm. Both i386 and s390x expect the function to be in RTL form for this hook so it doesn't apply on GIMPLE (and gimple unrolling). If we'd make it more general, thus handle both IL forms and instead of returning an unroll factor return a maximum growth factor, not maximum size to avoid comparing apples (size unit of target) and oranges (size unit of consumer) then it could indeed work. So sth like /* Return the maximum percentage LOOP is allowed to grow to or -1U for no target specific constraints. */ unsigned targetm.loop_growth_constraint (loop *loop); the existing uses in loop-unroll.c if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); would then become sth like if (targetm.loop_unroll_adjust) nunroll = MIN (nunroll, targetm.loop_unroll_adjust (loop) / 100); and existing hooks would have an early out if (in_gimple_form) return -1U; and their current return value simply multiplied by 100? Note the current x86 implementation does /* Count the number of memory references within the loop body. This value determines the unrolling factor for bdver3 and bdver4 architectures. */ ... if (mem_count && mem_count <=32) return 32/mem_count; return nunroll; and thus returns 32 for mem_count == 1 and any nunroll specified by the caller (like if it specified 2 because of some user --param). That looks bougs to me and we instead want to return MIN (nunroll, 32 / mem_count)? The s390 implementation gets this right.