Richard Sandiford writes:
> Richard Sandiford writes:
>> Revital Eres writes:
>>> btw, do you also have numbers of how much SMS (hopefully) improves
>>> performance on top of the vectorized code?
>>
>> OK, here's a comparison of:
>>
>> -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectoriz
On Thu, Aug 25, 2011 at 09:17:59AM +0100, Richard Sandiford wrote:
> Revital Eres writes:
> > btw, do you also have numbers of how much SMS (hopefully) improves
> > performance on top of the vectorized code?
>
> OK, here's a comparison of:
>
> -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -m
Hi,
Thanks again for measuring this.
> mjpegenc
> before: 50 runs take 7.31085s
> after: 50 runs take 3.04492s
> speedup: x2.4
mjpegenc and aacsbr-2 contains simple accumulation without
load/store dependence and thus SMS succeeds to improve them.
aacsbr-1 also contains such accumul
Richard Sandiford writes:
> Revital Eres writes:
>> btw, do you also have numbers of how much SMS (hopefully) improves
>> performance on top of the vectorized code?
>
> OK, here's a comparison of:
>
> -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad
> -fno-auto-inc
Hi,
>> Yes, I also noticed that. When I tested it only one reg-move was
>> created so the scheduling patch would not effect on it.
>
> FWIW, looking at the results I posted yesterday, the scheduling patch
> did improve the results compared with the non-scheduling patch:
You are right! this was my
Revital Eres writes:
>> mjpegenc is another case where SMS generates lots of spilling while the
>> normal scheduler doesn't.
>
> Yes, I also noticed that. When I tested it only one reg-move was
> created so the scheduling patch would not effect on it.
FWIW, looking at the results I posted yesterd
Hi,
>> btw, do you also have numbers of how much SMS (hopefully) improves
>> performance on top of the vectorized code?
>
> OK, here's a comparison of:
Thanks. I expected more improvements in aacsbr-2 as I see without the
vectorizer options... will look into that.
>
> mjpegenc is another case whe
Revital Eres writes:
> btw, do you also have numbers of how much SMS (hopefully) improves
> performance on top of the vectorized code?
OK, here's a comparison of:
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad
-fno-auto-inc-dec
vs:
-mcpu=cortex-a8 -mfpu=ne
Hi Richard,
> The effect on my flawed libav microbenchmarks was much greater
> than I imagined. I used the options:
Yeah, thats indeed looks impressive!
btw, do you also have numbers of how much SMS (hopefully) improves
performance on top of the vectorized code?
Thanks,
Revital
> -mcpu=cor
Following on from yesterday's call about what it would take to enable
SMS by default: one of the problems I was seeing with the SMS+IV patch
was that we ended up with excessive moves. E.g. a loop such as:
void
foo (int *__restrict a, int n)
{
int i;
for (i = 0; i < n; i +
10 matches
Mail list logo