On Mon, Jun 22, 2009 at 11:14 AM, Till
Straumann<[email protected]> wrote:
> Andrew Haley wrote:
>>
>> Till Straumann wrote:
>>
>>>
>>> gcc-4.3.2 seems to produce bad code when
>>> accessing an array of small 'volatile'
>>> objects -- it may try to access multiple
>>> such objects in a 'parallel' fashion.
>>> E.g., instead of reading two consecutive
>>> 'volatile short's sequentially it reads
>>> a single 32-bit longword. This may crash
>>> e.g., when accessing a memory-mapped device
>>> which allows only 16-bit accesses.
>>>
>>> If I compile this code fragment
>>>
>>> void volarrcpy(short *d, volatile short *s, int n)
>>> {
>>> int i;
>>> for (i=0; i<n; i++)
>>> d[i] = s[i];
>>> }
>>>
>>>
>>> with '-O3' (the critical option seems to be '-ftree-vectorize')
>>> then gcc-4.3.2 produces quite complicated code
>>> but the essential section is (powerpc)
>>>
>>> .L7:
>>> lhz 0,0(11)
>>> addi 11,11,2
>>> lwzx 0,4,9
>>> stwx 0,3,9
>>> addi 9,9,4
>>> bdnz .L7
>>>
>>> or i386
>>>
>>> .L7:
>>> movw (%ecx), %ax
>>> movl (%esi,%edx,4), %eax
>>> movl %eax, (%ebx,%edx,4)
>>> incl %edx
>>> addl $2, %ecx
>>> cmpl %edx, -20(%ebp)
>>> ja .L7
>>>
>>>
>>> Disassembled back into C-code, this reads
>>>
>>> uint32_t *dst_l = (uint32_t*)d;
>>> uint32_t *src_l = (uint32_t*)s;
>>>
>>> for (i=0; i<n/2; i++) {
>>> d[i] = s[i];
>>> dst_l[i] = src_l[i];
>>> }
>>>
>>> This code seems neither optimal nor correct.
>>> Besides reading half of the locations twice
>>> which violates the semantics of volatile
>>> objects accessing such objects in a 'vectorized'
>>> way (in this case: instead of reading
>>> two adjacent short addresses gcc emits
>>> a single 32-bit read) seems illegal to me.
>>>
>>> Similar behavior seems to be present in 4.3.3.
>>>
>>> Does anybody have some insight? Should I file
>>> a bug report?
>>>
>>
>> I can't reproduce this with "GCC: (GNU) 4.3.3 20081110 (prerelease)"
>>
>> .L8:
>> movzwl (%ecx), %eax
>> addl $1, %ebx
>> addl $2, %ecx
>> movw %ax, (%edx)
>> addl $2, %edx
>> cmpl %ebx, 16(%ebp)
>> jg .L8
>>
>> I think you should upgrade.
>>
>> Andrew.
>>
>
> OK, try this then:
>
> void
> c(char *d, volatile char *s)
> {
> int i;
> for ( i=0; i<32; i++ )
> d[i]=s[i];
> }
>
>
> (gcc --version: gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
That may be too old. Gcc 4.3.4 revision 148680
generates:
.L5:
leaq (%rsi,%rdx), %rax
movzbl (%rax), %eax
movb %al, (%rdi,%rdx)
addq $1, %rdx
cmpq $32, %rdx
jne .L5
--
H.J.