4.8 Regression] [SH] performance regression: lost mov @(disp,Rn)

chrbr at gcc dot gnu.org Wed, 11 Jul 2012 08:25:28 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39423


--- Comment #19 from chrbr at gcc dot gnu.org 2012-07-11 15:24:27 UTC ---
(In reply to comment #18)
> (In reply to comment #17)
> > Created attachment 27775 [details]
> > plus add combine
> > 
> > Here is the patch that I've been running since some time, it also use the 
> > same
> > combine pattern matcher, but the goal of this patch was originally to fix up
> > chains of multiple mult-add instructions.
> > Optimizing the cst+reg addressing mode appears as a nice effects. Out of 
> > range
> > indexes seems to be handled as afar as I can see.
> > 
> > This brings a EEMBC telecom speedup of 10%.FFMPEG code size reduced to 30% 
> > on a
> > few objects. 
> > Validated on whole linux distribution, with only improvements (few 
> > regression
> > only bellow noise).
> 
> Interesting.  
> BTW, do you happen to have any (runtime) numbers for GCC 4.7.x vs current GCC
> 4.8?
> 

for now I only track the 4.6 and 4.7 branches. the 4.8 is moving too fast, but
I could easily cheery-pick your the other SH changes (like your fix for
PR53911) 

btw I only bench on the SH4 and SH4A.

> > This patch is only for comments/illustration. Need a few polishing before
> > proposing. I'm having a look at your implementation to see how they compare 
> > and
> > possibly combined together. Both approaches look interesting.
> 
> I guess folding the mul-add sequences like you did should be more useful than
> just
> handling one mem:SI pattern.  In any case, if you find my impl useful please
> let me know,
> because then I'd also pop in patterns for mem:QI and mem:HI patterns.

sure. by the way, my patch is not complete to fix the original problem. I need
to extract other chunks that unleash it. Will post.

[Bug target/39423] [4.6/4.7/4.8 Regression] [SH] performance regression: lost mov @(disp,Rn)

Reply via email to