https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
--- Comment #35 from wschmidt at linux dot ibm.com --- Hi Jeff, Just a quick comment. We should never discuss raw runtimes of SPEC benchmarks on Power hardware in public. It's okay to talk about improvements (>12% in this case), but not wall clock time. Not a big deal, but there are some legal reasons regarding SPEC that cause us to be a little careful. Thanks! Bill On 5/21/20 12:29 AM, guojiufu at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 > > --- Comment #26 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> --- > Had a test on spec2017 xz_r by changing the specified loop manually, on > ppc64le. > > original loop (this loops occur three times in code): > while (++len != len_limit) > if (pb[len] != cur[len]) > break; > changed to loop: > typedef long long __attribute__((may_alias)) TYPEE; > > for(++len; len + sizeof(TYPEE) <= len_limit; len += sizeof(TYPEE)) { > long long a = *((TYPEE*)(cur+len)); > long long b = *((TYPEE*)(pb+len)); > if (a != b) { > break; //to optimize len can be move forward here. > } > } > for (;len != len_limit; ++len) > if (pb[len] != cur[len]) > break; > > We can see xz_r runtime improved from 433s to 382s(>12%). > It would be very valuable to do this kind of widening reading/checking. >