"David Mathog" <mat...@caltech.edu> writes: > I tried to track down the bug mentioned previously in testing my > software SSE2 when compiled with -m64 and ended up removing all > of the CHECK and my own includes without eliminating the bug. The test > program works fine with -m32, or with -m64 -msse2, but it fails with > -m64 -mno-sse2. Here is the greatly reduced gccprob2.c: > > 8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8< > #include <stdio.h> /* for printf */ > typedef double __m128d __attribute__ ((__vector_size__ (16), > __may_alias__)); > typedef union > { > __m128d x; > double a[2]; > } union128d; > #define EMM_FLT8(a) ((double *)&(a)) > > void test ( __m128d s1, __m128d s2) > { > printf("test s1 %lf %lf\n",EMM_FLT8(s1)[0],EMM_FLT8(s1)[1]); > printf("test s2 %lf %lf\n",EMM_FLT8(s2)[0],EMM_FLT8(s2)[1]); > } > > int main (void) > { > __attribute__ ((aligned (16))) union128d s1; > s1.a[0] = 1.0; > s1.a[1] = 2.0; > printf("s1 %lf %lf\n",s1.a[0],s1.a[1]); > test (s1.x, s1.x); > } > 8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8<8< > > Test runs: > > % gcc -msse -mno-sse2 -m64 -o foo gccprob2.c > % ./foo #first value in s2 is wrong > s1 1.000000 2.000000 > test s1 1.000000 2.000000 > test s2 2.000000 2.000000 > % gcc -msse -msse2 -m64 -o foo gccprob2.c > % ./foo > s1 1.000000 2.000000 > test s1 1.000000 2.000000 > test s2 1.000000 2.000000 > % gcc -msse -mno-sse2 -lm -m32 -o foo gccprob2.c > % ./foo > s1 1.000000 2.000000 > test s1 1.000000 2.000000 > test s2 1.000000 2.000000 > % gcc --version > gcc (GCC) 4.4.1 > % cat /etc/release > Mandriva Linux release 2010.0 (Official) for x86_64 > % cat /proc/cpuinfo | head -10 > processor : 0 > vendor_id : AuthenticAMD > cpu family : 15 > model : 33 > model name : Dual Core AMD Opteron(tm) Processor 280 > stepping : 2 > cpu MHz : 1000.000 > cache size : 1024 KB > physical id : 0 > siblings : 2 > > Is there something wrong with this program or is this a compiler bug?
I think this is a compiler bug in the i386 backend. The classify_argument function uses X86_64_SSEUP_CLASS for V2DFmode, and examine_argument counts that as requiring a single SSE register. However, since the SSE2 instructions are not available, the argument is split into two SSE registers. The result is that the first argument is passed in %xmm0/%xmm1, and the second argument is passed in %xmm1/%xmm2. That is, the arguments overlap, leading to the incorrect result. Basically, the 64-bit calling convention support assumes that the SSE2 instructions are always available, and silently fails when -mno-sse2 is used. I don't really have an opinion as to whether the compiler needs to support this case correctly, but I think that clearly it must not silently fail. Please consider opening a bug report for this. Thanks. Ian