powerpc/rs6000 implicit FPU usage
The issue of gcc implicitely generating floating point instructions (i.e., without 'double' or 'float' types being used in the source code) has come up a few times in the past (e.g., 2002/10: GCC floating point usage) Miraculously, I found that gcc-4.0.2 (unlike 3.2 or 3.4) no longer generates a lfd/stfd pair for something like this: unsigned long long *px, *py; *px = *py; However, I was unable to find any documentation or recent discussion about the issue. Has this kind of optimization (using the FPU for data objects other than double/float) been deliberately abandoned or is it the side-effect of other changes? Thanks -- Till PS: Please CC me - I'm not on this list
gcc 4.3.2 vectorizes access to volatile array
gcc-4.3.2 seems to produce bad code when accessing an array of small 'volatile' objects -- it may try to access multiple such objects in a 'parallel' fashion. E.g., instead of reading two consecutive 'volatile short's sequentially it reads a single 32-bit longword. This may crash e.g., when accessing a memory-mapped device which allows only 16-bit accesses. If I compile this code fragment void volarrcpy(short *d, volatile short *s, int n) { int i; for (i=0; i
Re: gcc 4.3.2 vectorizes access to volatile array
Andrew Haley wrote: Till Straumann wrote: gcc-4.3.2 seems to produce bad code when accessing an array of small 'volatile' objects -- it may try to access multiple such objects in a 'parallel' fashion. E.g., instead of reading two consecutive 'volatile short's sequentially it reads a single 32-bit longword. This may crash e.g., when accessing a memory-mapped device which allows only 16-bit accesses. If I compile this code fragment void volarrcpy(short *d, volatile short *s, int n) { int i; for (i=0; i I can't reproduce this with "GCC: (GNU) 4.3.3 20081110 (prerelease)" .L8: movzwl (%ecx), %eax addl$1, %ebx addl$2, %ecx movw%ax, (%edx) addl$2, %edx cmpl%ebx, 16(%ebp) jg .L8 I think you should upgrade. Andrew. OK, try this then: void c(char *d, volatile char *s) { int i; for ( i=0; i<32; i++ ) d[i]=s[i]; } (gcc --version: gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3) gcc -m32 -c -S -O3 produces an unrolled sequence: movzbl (%ecx), %eax leal20(%ebx), %edx movl(%ecx), %eax movl%eax, (%edi) movzbl 1(%ecx), %eax movl4(%ecx), %eax movl%eax, 4(%edi) movzbl 2(%ecx), %eax movl4(%ebx), %eax movl%eax, 4(%esi) movzbl 3(%ecx), %eax movl8(%ebx), %eax movl%eax, 8(%esi) ... < snip >... The 64-bit version even uses SSE registers to load the volatile data: (gcc -c -S -O3) .L7: movzbl (%rsi), %eax movdqu (%rsi), %xmm0 movdqa %xmm0, (%rdi) movzbl 1(%rsi), %eax movdqu (%rdx), %xmm0 movdqa %xmm0, 16(%rdi) Not sure an upgrade helps ;-) -- Till
Re: gcc 4.3.2 vectorizes access to volatile array
That's roughly the same that 4.3.3 produces. I had not quoted the full assembly code but just the essential part that is executed when source and destination are 4-byte aligned and are more than 4-bytes apart. Otherwise (not longword-aligned) the (correct) code labeled '.L5' is executed. -- Till Andrew Haley wrote: H.J. Lu wrote: On Mon, Jun 22, 2009 at 11:14 AM, Till Straumann wrote: Andrew Haley wrote: Till Straumann wrote: gcc-4.3.2 seems to produce bad code when accessing an array of small 'volatile' objects -- it may try to access multiple such objects in a 'parallel' fashion. E.g., instead of reading two consecutive 'volatile short's sequentially it reads a single 32-bit longword. This may crash e.g., when accessing a memory-mapped device which allows only 16-bit accesses. If I compile this code fragment void volarrcpy(short *d, volatile short *s, int n) { int i; for (i=0; i I can't reproduce this with "GCC: (GNU) 4.3.3 20081110 (prerelease)" .L8: movzwl (%ecx), %eax addl$1, %ebx addl$2, %ecx movw%ax, (%edx) addl$2, %edx cmpl%ebx, 16(%ebp) jg .L8 I think you should upgrade. Andrew. OK, try this then: void c(char *d, volatile char *s) { int i; for ( i=0; i<32; i++ ) d[i]=s[i]; } (gcc --version: gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3) ^ That may be too old. Gcc 4.3.4 revision 148680 generates: .L5: leaq(%rsi,%rdx), %rax movzbl (%rax), %eax movb%al, (%rdi,%rdx) addq$1, %rdx cmpq$32, %rdx jne .L5 4.4.0 20090307 generates truly bizarre code, though: gcc -m32 -c -S -O3 x.c c: pushl %ebp movl%esp, %ebp pushl %ebx movl12(%ebp), %edx movl8(%ebp), %ebx movl%edx, %ecx orl %ebx, %ecx andl$3, %ecx leal4(%ebx), %eax je .L10 .L2: xorl%eax, %eax .p2align 4,,7 .p2align 3 .L5: leal(%edx,%eax), %ecx movzbl (%ecx), %ecx movb%cl, (%ebx,%eax) addl$1, %eax cmpl$32, %eax jne .L5 popl%ebx popl%ebp ret .p2align 4,,7 .p2align 3 .L10: leal4(%edx), %ecx cmpl%ecx, %ebx jbe .L11 .L7: movzbl (%edx), %ecx movl(%edx), %ecx movl%ecx, (%ebx) movzbl 1(%edx), %ecx movl4(%edx), %ecx movl%ecx, 4(%ebx) movzbl 2(%edx), %ecx movl8(%edx), %ecx movl%ecx, 4(%eax) movzbl 3(%edx), %ecx movl12(%edx), %ecx movl%ecx, 8(%eax) movzbl 4(%edx), %ecx movl16(%edx), %ecx movl%ecx, 12(%eax) movzbl 5(%edx), %ecx movl20(%edx), %ecx movl%ecx, 16(%eax) movzbl 6(%edx), %ebx leal24(%edx), %ecx movl24(%edx), %ebx movl%ebx, 20(%eax) movzbl 7(%edx), %edx movl4(%ecx), %edx movl%edx, 24(%eax) popl%ebx popl%ebp ret .p2align 4,,7 .p2align 3 .L11: cmpl%edx, %eax jae .L2 jmp .L7
gcc-4.3.0/ppc32 inline assembly produces bad code
As of gcc-4.3.0 one of my drivers stopped working. I distilled the code down to a few lines (see below). The output of gcc-4.3.0 -S -O2 -c tst.c and gcc-4.2.3 -S -O2 -c tst.c is attached. The code generated by gcc-4.2.3 is as expected. However, gcc-4.3.0 does not produce a correct result. Is my inline assembly wrong or is this a gcc bug ? (If the memory input operand to the inline assembly statement is omitted then everything is OK). Please CC-me on any replies; I'm not subscribed to the gcc list. WKR -- Till ### Test case; C-code: ### /* Powerpc I/O barrier instruction */ #define EIEIO(pmem) do { asm volatile("eieio":"=m"(*pmem):"m"(*pmem)); } while (0) /* * Reset a bit in a register and poll for it to be set by the hardware */ void test(volatile unsigned *base) { volatile unsigned *reg_p = base + IEVENT_REG/sizeof(*base); unsigned val; /* Clear bit in register by writing a one */ *reg_p = IEVENT_GRSC; EIEIO( reg_p ); /* Poll bit until set by hardware */ do { val = *reg_p; EIEIO( reg_p ); } while ( ! (val & IEVENT_GRSC) ); } # Assembly code produced by gcc-4.3.0: # .file "tst.c" .gnu_attribute 4, 1 .gnu_attribute 8, 1 .section".text" .align 2 .globl test .type test, @function test: li 0,256 mr 9,3 /*** ^ NOTE: base address copied into R9 ***/ stw 0,16(3) # 21 "c.c" 1 eieio # 0 "" 2 .L2: lwz 0,0(9) /*** ^ BAD: R0 = *base instead of *reg_p ***/ # 26 "c.c" 1 eieio # 0 "" 2 andi. 11,0,256 beq+ 0,.L2 blr .size test, .-test ### Assembly code produced by gcc-4.2.3: ### .file "tst.c" .section".text" .align 2 .globl test .type test, @function test: li 0,256 addi 9,3,16 /*** NOTE: R9 = base + ***/ stw 0,16(3) eieio .L2: lwz 0,0(9) /*** GOOD: R0 = *reg_p ***/ eieio andi. 11,0,256 beq+ 0,.L2 blr .size test, .-test .ident "GCC: (GNU) 4.2.3"
Re: gcc-4.3.0/ppc32 inline assembly produces bad code
Daniel Jacobowitz wrote: On Wed, Mar 26, 2008 at 09:25:05AM -0700, Till Straumann wrote: Is my inline assembly wrong or is this a gcc bug ? Your inline assembly seems wrong. I'm not yet convinced about that... /* Powerpc I/O barrier instruction */ #define EIEIO(pmem) do { asm volatile("eieio":"=m"(*pmem):"m"(*pmem)); } while (0) An output memory doesn't mean what you think. How do I tell gcc that the asm modifies a certain area of memory w/o adding all memory to the clobber list? I thought the output memory operand did just that. I suspect GCC gave you an input memory operand as "%r0(%r9)" Hmm you mean it gave me 0(%r9) in %r0 ? That would still be wrong because I asked for *reg_p which would be 16(%r9). Plus, I thought gcc would do that if the constraint was "r" not "m". and an output memory operand as "%r9", In any case, bad code is also produced by gcc-4.3.0 if I omit the memory output operand and use an example that comes pretty close to what is in the gcc info page (example illustrating a memory input in the 'extended asm' section): ** C-Code ** #define IEVENT_REG 0x010 #define IEVENT_GRSC (1<<8) void test(volatile unsigned *base) { volatile unsigned *reg_p = base + IEVENT_REG/sizeof(*base); unsigned val; /* tell gcc that the asm needs/looks at *reg_p */ asm volatile ("lwz %0, 16(%1)":"=r"(val):"b"(base),"m"(*reg_p)); while ( ! (val & IEVENT_GRSC) ) val = *reg_p; ; } *** assembly produced by gcc-4.3.0 *** .file "tst.c" .gnu_attribute 4, 1 .gnu_attribute 8, 1 .section".text" .align 2 .globl test .type test, @function test: mr 9,3 # 13 "b.c" 1 lwz 3, 16(3) # 0 "" 2 andi. 0,3,256 bnelr- 0 .L5: lwz 0,0(9) /* BAD: R0 = *base instead of R0 = *reg_p */ andi. 11,0,256 beq+ 0,.L5 blr .size test, .-test .ident "GCC: (GNU) 4.3.0" assembly produced by gcc-4.2.3 .file "tst.c" .section".text" .align 2 .globl test .type test, @function test: addi 9,3,16 lwz 3, 16(3) andi. 0,3,256 bnelr- 0 .L5: lwz 0,0(9) /* GOOD R0 = *reg_p */ andi. 11,0,256 beq+ 0,.L5 blr .size test, .-test .ident "GCC: (GNU) 4.2.3" and expected the asm to do what it said it would do with its operands. Which doesn't make much sense... but there you go. Try clobbering it instead, but you don't even need to since the pointer is already volatile. asm volatile ("eieio") should work fine. Yes but there still seems to be a problem (see example in this message) -- Till.
Re: gcc-4.3.0/ppc32 inline assembly produces bad code
Andreas Schwab wrote: Till Straumann <[EMAIL PROTECTED]> writes: /* Powerpc I/O barrier instruction */ #define EIEIO(pmem) do { asm volatile("eieio":"=m"(*pmem):"m"(*pmem)); } while (0) Looking closer, your asm statement has a bug. The "m" constraint can match memory addresses with side effects (auto inc/dec), but the insn does not carry out that side effect. I'm sorry - that's a bit too brief for me to understand. Are you talking about the memory-input or memory-output operand? Could you please elaborate a little bit? Also, did you look at my other example? void test(volatile unsigned *base) { volatile unsigned *reg_p = base + IEVENT_REG/sizeof(*base); unsigned val; /* tell gcc that the asm needs/looks at *reg_p */ asm volatile ("lwz %0, 16(%1)":"=r"(val):"b"(base),"m"(*reg_p)); while ( ! (val & IEVENT_GRSC) ) val = *reg_p; ; } Thanks -- Till On powerpc the side effect must be encoded through the update form of the load/store insns. If you don't use a load or store insn with the operand the you must use the "o" constraint to avoid the side effect. Andreas.
Re: gcc-4.3.0/ppc32 inline assembly produces bad code
Andreas Schwab wrote: Daniel Jacobowitz <[EMAIL PROTECTED]> writes: On Mon, Mar 31, 2008 at 03:06:24PM +0200, Andreas Schwab wrote: The side effect is carried out by using %U0, which expands to u for a PRE_{INC,DEC,MODIFY} operand. There is no way to encode that in the insn operand itself, unlike m68k, for example. The ia64 target has a similar issue. OK, so it's possible to get that right. Still - how many people writing inline assembly do we think will do so? This is back to something the S/390 maintainers were working on a few months ago; in short the useful definition of "m" to GCC is not the useful one for users, I don't think. Especially when it changes. There are many pitfalls when you use inline asm. Another example is the difference between "r" and "b" on powerpc, where you can silently get bad code if you use "r" when "b" would be needed (in some insns 0 means constant 0 instead of register 0). Sure - but that is at least (minimally) documented. One sentence more in the gcc manual wouldn't hurt though... T. Andreas.
Re: gcc-4.3.0/ppc32 inline assembly produces bad code
Andreas Schwab wrote: Till Straumann <[EMAIL PROTECTED]> writes: asm volatile ("lwz %0, 16(%1)":"=r"(val):"b"(base),"m"(*reg_p)); asm volatile ("lwz%U1%X1 %0, %1":"=r"(val):"m"(*reg_p)); Hmm - that is beyond me. What exactly do %U1 and %X1 mean? I suspect that %U1 means that operand #1 is to be updated, right? How am I supposed to know that my version is wrong (and it used to work with older gccs)? I just followed the manual which states: > If your assembler instructions access memory in an unpredictable >fashion, add `memory' to the list of clobbered registers. This will >cause GCC to not keep memory values cached in registers across the >assembler instruction and not optimize stores or loads to that memory. >You will also want to add the `volatile' keyword if the memory affected >is not listed in the inputs or outputs of the `asm', as the `memory' >clobber does not count as a side-effect of the `asm'. If you know how >large the accessed memory is, you can add it as input or output but if >this is not known, you should add `memory'. As an example, if you >access ten bytes of a string, you can use a memory input like: > > {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}. > > Note that in the following example the memory input is necessary, >otherwise GCC might optimize the store to `x' away: > int foo () > { > int x = 42; > int *y = &x; > int result; > asm ("magic stuff accessing an 'int' pointed to by '%1'" > "=&d" (r) : "a" (y), "m" (*y)); > return result; > } IMHO my example is pretty close to this... Also, the manual doesn't say that 'm' can only match memory addresses with side effects; it says 'a memory operand is allowed, with any kind of address that the machine supports in general'. Thanks -- Till Andreas.
inline assembly question (memory side-effects)
Hi. What is the proper way to tell gcc that a inline assembly statement either modifies a particular area of memory or needs it to be updated/in-sync because the assembly reads from it. E.g., assume I have a struct blah { int sum; ... }; which is accessed by my assembly code. int my_chksum(struct blah *blah_p) { blah_p->sum = 0; /* assembly code computes checksum, writes to blah_p->sum */ asm volatile("..."::"b"(blah_p)); return blah_p->sum; } How can I tell gcc that the object *blah_p is accessed (read and/or written to) by the asm (without declaring the object volatile or using a general "memory" clobber). (If I don't take any measures then gcc won't know that blah_p->sum is modified by the assembly and will optimize the load operation away and return always 0) A while ago (see this thread http://gcc.gnu.org/ml/gcc/2008-03/msg00976.html) I was told that simply adding the affected memory area in question as memory ('m') output- or input-operands is not appropriate -- note that this is in contradiction to what the gcc info page suggests (section about 'extended asm': "If you know how large the accessed memory is, you can add it as input or output..." along with an example). Could somebody please clarify (please don't suggest how this particular example could be improved but try to answer the fundamental question) ? Thanks -- Till PS: please CC me as I'm not subscribed to the gcc list