Till Straumann wrote:
> gcc-4.3.2 seems to produce bad code when
> accessing an array of small 'volatile'
> objects -- it may try to access multiple
> such objects in a 'parallel' fashion.
> E.g., instead of reading two consecutive
> 'volatile short's sequentially it reads
> a single 32-bit longword. This may crash
> e.g., when accessing a memory-mapped device
> which allows only 16-bit accesses.
>
> If I compile this code fragment
>
> void volarrcpy(short *d, volatile short *s, int n)
> {
> int i;
> for (i=0; i<n; i++)
> d[i] = s[i];
> }
>
>
> with '-O3' (the critical option seems to be '-ftree-vectorize')
> then gcc-4.3.2 produces quite complicated code
> but the essential section is (powerpc)
>
> .L7:
> lhz 0,0(11)
> addi 11,11,2
> lwzx 0,4,9
> stwx 0,3,9
> addi 9,9,4
> bdnz .L7
>
> or i386
>
> .L7:
> movw (%ecx), %ax
> movl (%esi,%edx,4), %eax
> movl %eax, (%ebx,%edx,4)
> incl %edx
> addl $2, %ecx
> cmpl %edx, -20(%ebp)
> ja .L7
>
>
> Disassembled back into C-code, this reads
>
> uint32_t *dst_l = (uint32_t*)d;
> uint32_t *src_l = (uint32_t*)s;
>
> for (i=0; i<n/2; i++) {
> d[i] = s[i];
> dst_l[i] = src_l[i];
> }
>
> This code seems neither optimal nor correct.
> Besides reading half of the locations twice
> which violates the semantics of volatile
> objects accessing such objects in a 'vectorized'
> way (in this case: instead of reading
> two adjacent short addresses gcc emits
> a single 32-bit read) seems illegal to me.
>
> Similar behavior seems to be present in 4.3.3.
>
> Does anybody have some insight? Should I file
> a bug report?
I can't reproduce this with "GCC: (GNU) 4.3.3 20081110 (prerelease)"
.L8:
movzwl (%ecx), %eax
addl $1, %ebx
addl $2, %ecx
movw %ax, (%edx)
addl $2, %edx
cmpl %ebx, 16(%ebp)
jg .L8
I think you should upgrade.
Andrew.