https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
--- Comment #30 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> --- (In reply to Wilco from comment #29) > (In reply to Jiu Fu Guo from comment #28) > > > > > > Find one interesting thing: > > > If using widen reading for the run which > 16 iterations, we can see the > > > performance is significantly improved(>18%) for xz_r in spec. > > > This means that the frequency is small for >16, while it still costs a big > > > part of the runtime. > > > > > > > Oh, Recheck frequency in my test, the frequency is big (99.8%) for >16 > > iterations. > > The frequency for >16 iterations is small, 2.1%. The limit is generally > large, but the actual number of iterations is what matters because of the > early exit. Right, the actual number of executed iterations relates to runtime. > > The key question remains whether it is legal to assume the limit implies the > memory is valid and use wider accesses. If unaligned access is support, it would be valid to access the memory. Otherwise, checking like ((pb&7) == (cur & 7)) would cost an additional test, and it may not sure likely to be true.