[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

guojiufu at gcc dot gnu.org Wed, 27 May 2020 06:07:27 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398


--- Comment #30 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Wilco from comment #29)
> (In reply to Jiu Fu Guo from comment #28)
> > > 
> > > Find one interesting thing:
> > > If using widen reading for the run which > 16 iterations, we can see the
> > > performance is significantly improved(>18%) for xz_r in spec.
> > > This means that the frequency is small for >16, while it still costs a big
> > > part of the runtime.
> > > 
> > 
> > Oh, Recheck frequency in my test, the frequency is big (99.8%) for >16
> > iterations.
> 
> The frequency for >16 iterations is small, 2.1%. The limit is generally
> large, but the actual number of iterations is what matters because of the
> early exit.
Right, the actual number of executed iterations relates to runtime. 
> 
> The key question remains whether it is legal to assume the limit implies the
> memory is valid and use wider accesses.
If unaligned access is support, it would be valid to access the memory.
Otherwise, checking like ((pb&7) == (cur & 7)) would cost an additional test,
and it may not sure likely to be true.

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

Reply via email to