https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83358

--- Comment #3 from Jan Hubicka <hubicka at ucw dot cz> ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83358
> 
> --- Comment #2 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
> The following fixes this particular issue:
> 
> diff --git a/gcc/config/i386/x86-tune-costs.h
> b/gcc/config/i386/x86-tune-costs.h         
> index 312467d9788..00f1dae9085 100644       
> --- a/gcc/config/i386/x86-tune-costs.h      
> +++ b/gcc/config/i386/x86-tune-costs.h      
> @@ -2345,7 +2345,7 @@ struct processor_costs core_cost = {                    
>   
>    {COSTS_N_INSNS (8),                  /* cost of a divide/mod for QI */     
>   
>     COSTS_N_INSNS (8),                  /*                          HI */     
>   
>     /* 8-11 */                              
> -   COSTS_N_INSNS (11),                 /*                          SI */     
>   
> +   COSTS_N_INSNS (13),                 /*                          SI */     
>   
>     /* 24-81 */                             
>     COSTS_N_INSNS (81),                 /*                          DI */     
>   
>     COSTS_N_INSNS (81)},                        /*                         
> other */ 
> 
> Perhaps the div costs are a bit too tight in general?

The main problem here is that the algorithm expading div/mod into shift/add/lea
does not consider at all the parallelism and thus the cost model is not
realistic.
I meant to write bencmark that will try different constatns and see if they are
after by idiv or by expanded sequence.

The original costs was still based on pentium4, so I brought them to be
consistently
basedd on latencies.
Increasing values bit over the estimated latencies (with comment on why that
was done)
is perhaps easiest short term solution.

Honza

Reply via email to