> > gcc/ > > * config/i386/i386-expand.c (expand_set_or_cpymem_via_rep): > For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, don't convert QImode > to SImode. > (decide_alg): For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, use > "rep movsb/stosb" only for known sizes. > * config/i386/i386-options.c (processor_cost_table): Use Ice > Lake cost for Cannon Lake, Ice Lake, Tiger Lake, Sapphire > Rapids and Alder Lake. > * config/i386/i386.h (TARGET_PREFER_KNOWN_REP_MOVSB_STOSB): New. > * config/i386/x86-tune-costs.h (icelake_memcpy): New. > (icelake_memset): Likewise. > (icelake_cost): Likewise. > * config/i386/x86-tune.def (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): > New.
It looks like X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is quite obviously benefical and independent of the rest of changes. I think we will need to discuss bit more the move ratio and the code size/uop cache polution issues - one option would be to use increased limits for -O3 only. Can you break this out to independent patch? I also wonder if it owuld not be more readable to special case this just on the beggining of decide_alg. > @@ -6890,6 +6891,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT > expected_size, > const struct processor_costs *cost; > int i; > bool any_alg_usable_p = false; > + bool known_size_p = expected_size != -1; expected_size is not -1 if we have profile feedback and we detected from histogram average size of a block. It seems to me that from description that you want the const to be actual compile time constant that would be min_size == max_size I guess. Honza