Michael Niedermayer <[email protected]> 于2023年3月8日周三 04:45写道:
>
> On Tue, Mar 07, 2023 at 05:08:27PM +0800, Junxian Zhu wrote:
> > From: Junxian Zhu <[email protected]>
> >
> > Rewrite mid_pred function in generic mathops.h, reduce branch jump to
> > improve performance. And because nowadays new version compiler can compile
> > enough short asmbbely code as handwritting in these function, so remove
> > specified optimized mips inline asmbbely mathops.h.
>
> as you write, that it improves performance
> what speed effect does this have exactly?
> thx
>
I tested the performance, using this code
```
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define FFMIN(a, b) ( a>b ? b : a )
#define FFMAX(a, b) ( a>b ? a : b )
int mid_pred(int a, int b, int c)
{
#if OLD
if(a>b){
if(c>b){
if(c>a) b=a;
else b=c;
}
}else{
if(b>c){
if(c>a) b=c;
else b=a;
}
}
return b;
#else
int t0,t1,t2,t3;
t0 = (a > b) ? b : a ;
t1 = (a > b) ? a : b ;
t2 = (t0 > c) ? t0 : c;
t3 = (t1 > t2) ? t2 : t1;
return t3;
#endif
}
int main() {
int a[1024], b[1024], c[1024], d[1024];
srand(time(NULL));
for(int i=0; i<1024; i++) {
a[i] = rand();
b[i] = rand();
c[i] = rand();
}
for (int j=0; j<1e7+rand()%2; j++)
for(int i=0; i<1024; i++)
d[i] = mid_pred(a[i], b[i], c[i]);
printf("%d, %d\n", d[rand()%1024], j);
}
```
On MacOS 13.2 with Apple M1:
The old code the new code
2.1s 2.3s
On Cavium ThunderX / arm64 (GCC 10.2.1 -O3)
The old code the new code
52.7s 37.8s
On Loongson 3A4000/mips64el (GCC 10.2.1 -O3)
The old code the new code
90s 5s
On Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz (GCC 10.2.1 -O3)
The old code the new code
14.4s 15.4s
On SF19A2890/MIPS interAptiv (GCC 10.2.1 -O3)
The old code the new code
314s 39.3s
On Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz (GCC 12.2.0 -O3)
The old code the new code
14.4s 8.8s
On sifive,bullet0/rv64imafdc (GCC 12.2.0 -O3, 1e6 times instead of 1e7)
The old code the new code
11.9s 15.2s
On Freescale i.MX53/ARMv7 Processor rev 5 (v7l) (GCC 12.2.0 -O3, 1e6
times instead of 1e7)
The old code the new code
24.1s 15.7s
On POWER8 (architected), altivec supported, BIG ENDIAN, ppc64 (GCC 12.2.0 -O3)
The old code the new code
43.1s 50.8s
On POWER8 (architected), altivec supported, LITTLE ENDIAN, ppc64el
(GCC 12.2.0 -O3)
The old code the new code
7.8s 4.7s
On PA8900 (Shortfin) PA-RISC (GCC 12.2.0 -O3 1e6 times instead of 1e7)
The old code the new code
39.9s 47.2s
On IBM/S390 aka s390x (GCC 12.2.0 -O3)
The old code the new code
82.2s 30.8s
On Intel(R) Itanium(R) Processor 9320 (GCC 12.2.0 -O3)
The old code the new code
89.5s 78.1s
Cavium Octeon III V0.2 FPU V0.0 /mipsel (GCC 12.2.0 -O3)
The old code the new code
117.5s 118.5s
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> It is dangerous to be right in matters on which the established authorities
> are wrong. -- Voltaire
> _______________________________________________
> ffmpeg-devel mailing list
> [email protected]
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> [email protected] with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".