------- Comment #7 from hubicka at ucw dot cz 2009-01-15 01:49 ------- Subject: Re: [4.4 regression] performance regression of sse code from 4.2/4.3
I guess th3 main difference here is that load + addps pair generate 2 uops, while mov + loading addps generate 3 since the move has to go through the queue. I will try to change testcase to fit in cache to see if AMD machine reproduce it too.. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824