https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilco at gcc dot gnu.org --- Comment #29 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Jiu Fu Guo from comment #28) > (In reply to Jiu Fu Guo from comment #27) > > > > 12: 1.2 > > > 13: 0.9 > > > 14: 0.8 > > > 15: 0.7 > > > 16: 2.1 > > > > > > > Find one interesting thing: > > If using widen reading for the run which > 16 iterations, we can see the > > performance is significantly improved(>18%) for xz_r in spec. > > This means that the frequency is small for >16, while it still costs a big > > part of the runtime. > > > > Oh, Recheck frequency in my test, the frequency is big (99.8%) for >16 > iterations. The frequency for >16 iterations is small, 2.1%. The limit is generally large, but the actual number of iterations is what matters because of the early exit. The key question remains whether it is legal to assume the limit implies the memory is valid and use wider accesses.