https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #17)
> We already have
> /* This function adjusts the unroll factor based on
>    the hardware capabilities. For ex, bdver3 has
>    a loop buffer which makes unrolling of smaller
>    loops less important. This function decides the
>    unroll factor using number of memory references
>    (value 32 is used) as a heuristic. */
> 
> static unsigned
> ix86_loop_unroll_adjust (unsigned nunroll, struct loop *loop)
> 
> which triggers with TARGET_ADJUST_UNROLL
> /* X86_TUNE_ADJUST_UNROLL: This enables adjusting the unroll factor based   
> 
>    on hardware capabilities. Bdver3 hardware has a loop buffer which makes  
> 
>    unrolling small loop less important. For, such architectures we adjust   
> 
>    the unroll factor so that the unrolled loop fits the loop buffer.  */    
> 
> DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 |
> m_BDVER4)  
> 
> so perhaps what you propose can be done by making this one more general?

Hmm.  Both i386 and s390x expect the function to be in RTL form for this hook
so it doesn't apply on GIMPLE (and gimple unrolling).

If we'd make it more general, thus handle both IL forms and instead of
returning an unroll factor return a maximum growth factor, not maximum
size to avoid comparing apples (size unit of target) and oranges (size
unit of consumer) then it could indeed work.  So sth like

/* Return the maximum percentage LOOP is allowed to grow to or -1U for
   no target specific constraints.  */
unsigned targetm.loop_growth_constraint (loop *loop);

the existing uses in loop-unroll.c

  if (targetm.loop_unroll_adjust)
    nunroll = targetm.loop_unroll_adjust (nunroll, loop);

would then become sth like

  if (targetm.loop_unroll_adjust)
    nunroll = MIN (nunroll, targetm.loop_unroll_adjust (loop) / 100);

and existing hooks would have an early out

  if (in_gimple_form)
     return -1U;

and their current return value simply multiplied by 100?

Note the current x86 implementation does

  /* Count the number of memory references within the loop body.
     This value determines the unrolling factor for bdver3 and bdver4
     architectures. */
...
  if (mem_count && mem_count <=32)
    return 32/mem_count;

  return nunroll;

and thus returns 32 for mem_count == 1 and any nunroll specified by the
caller (like if it specified 2 because of some user --param).  That
looks bougs to me and we instead want to return MIN (nunroll, 32 / mem_count)?
The s390 implementation gets this right.

Reply via email to