9 Regression] gimple mem-to-mem assignment badly optimized

rguenth at gcc dot gnu.org Wed, 22 Aug 2018 02:32:44 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87008


--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Marc Glisse from comment #1)
> struct A { double a, b; };
> struct B : A {};
> template<class T>void cp(T&a,T const&b){a=b;}
> double f(B x){
>   B y; cp<A>(y,x);
>   B z; cp<A>(z,x);
>   return y.a - z.a;
> }
> 
> This is not quite equivalent because RTL manages to optimize this case, but
> gimple, with -Ofast, still gets the ugly:
> 
>   ISRA.1 = MEM[(const struct A &)&x];
>   SR.9_9 = MEM[(struct A *)&ISRA.1];
>   ISRA.1 = MEM[(const struct A &)&x];
>   SR.8_10 = MEM[(struct A *)&ISRA.1];
>   _3 = SR.9_9 - SR.8_10;
>   return _3;
> 
> Writing cp<B> instead of cp<A> makes it work, and the main difference starts
> in SRA. I expect (didn't check) this is another victim of r255510 .

The initial IL is too convoluted for early FRE to figure out the equivalences.
For the above which is visible to the late FRE the issue is that the
redundant aggregate copy gets in the way which isn't detected in early FRE
and FRE also doesn't try to remove or detect redundant aggregate copies
because we don't really "value-number" aggregate stores.

To sum up it - aggregate copies are bad ;)  But they also sometimes
help - all the vn_reference_op_lookup_3 tricks wouldn't work without
them unless you end up with store pieces that always fully cover all
downstream loads.

[Bug tree-optimization/87008] [8/9 Regression] gimple mem-to-mem assignment badly optimized

Reply via email to