powerpc/rs6000 implicit FPU usage

2005-11-01 Thread Till Straumann

The issue of gcc implicitely generating floating point
instructions (i.e., without 'double' or 'float' types
being used in the source code) has come up a few times
in the past (e.g., 2002/10: GCC floating point usage)

Miraculously, I found that gcc-4.0.2 (unlike 3.2 or 3.4)
no longer generates a lfd/stfd pair for something like this:

unsigned long long *px, *py;

 *px = *py;

However, I was unable to find any documentation or recent
discussion about the issue.
Has this kind of optimization (using the FPU for
data objects other than double/float) been deliberately
abandoned or is it the side-effect of other changes?

Thanks
-- Till

PS: Please CC me - I'm not on this list


gcc 4.3.2 vectorizes access to volatile array

2009-06-22 Thread Till Straumann

gcc-4.3.2 seems to produce bad code when
accessing an array of small 'volatile'
objects -- it may try to access multiple
such objects in a 'parallel' fashion.
E.g., instead of reading two consecutive
'volatile short's sequentially it reads
a single 32-bit longword. This may crash
e.g., when accessing a memory-mapped device
which allows only 16-bit accesses.

If I compile this code fragment

void volarrcpy(short *d, volatile short *s, int n)
{
int i;
 for (i=0; i

Re: gcc 4.3.2 vectorizes access to volatile array

2009-06-22 Thread Till Straumann

Andrew Haley wrote:

Till Straumann wrote:
  

gcc-4.3.2 seems to produce bad code when
accessing an array of small 'volatile'
objects -- it may try to access multiple
such objects in a 'parallel' fashion.
E.g., instead of reading two consecutive
'volatile short's sequentially it reads
a single 32-bit longword. This may crash
e.g., when accessing a memory-mapped device
which allows only 16-bit accesses.

If I compile this code fragment

void volarrcpy(short *d, volatile short *s, int n)
{
int i;
 for (i=0; i


I can't reproduce this with "GCC: (GNU) 4.3.3 20081110 (prerelease)"

.L8:
movzwl  (%ecx), %eax
addl$1, %ebx
addl$2, %ecx
movw%ax, (%edx)
addl$2, %edx
cmpl%ebx, 16(%ebp)
jg  .L8

I think you should upgrade.

Andrew.
  


OK, try this then:

void
c(char *d, volatile char *s)
{
int i;
   for ( i=0; i<32; i++ )
   d[i]=s[i];
}


(gcc --version: gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3)

gcc -m32 -c -S -O3
produces an unrolled sequence:

   movzbl  (%ecx), %eax
   leal20(%ebx), %edx
   movl(%ecx), %eax
   movl%eax, (%edi)
   movzbl  1(%ecx), %eax
   movl4(%ecx), %eax
   movl%eax, 4(%edi)
   movzbl  2(%ecx), %eax
   movl4(%ebx), %eax
   movl%eax, 4(%esi)
   movzbl  3(%ecx), %eax
   movl8(%ebx), %eax
   movl%eax, 8(%esi)
... < snip >...

The 64-bit version even uses SSE registers to
load the volatile data:

(gcc -c -S -O3)

.L7:
   movzbl  (%rsi), %eax
   movdqu  (%rsi), %xmm0
   movdqa  %xmm0, (%rdi)
   movzbl  1(%rsi), %eax
   movdqu  (%rdx), %xmm0
   movdqa  %xmm0, 16(%rdi)

Not sure an upgrade helps ;-)

-- Till


Re: gcc 4.3.2 vectorizes access to volatile array

2009-06-22 Thread Till Straumann

That's roughly the same that 4.3.3 produces.
I had not quoted the full assembly code but just
the essential part that is executed when
source and destination are 4-byte aligned
and are more than 4-bytes apart.
Otherwise (not longword-aligned) the
(correct) code labeled '.L5' is executed.

-- Till

Andrew Haley wrote:

H.J. Lu wrote:
  

On Mon, Jun 22, 2009 at 11:14 AM, Till
Straumann wrote:


Andrew Haley wrote:
  

Till Straumann wrote:



gcc-4.3.2 seems to produce bad code when
accessing an array of small 'volatile'
objects -- it may try to access multiple
such objects in a 'parallel' fashion.
E.g., instead of reading two consecutive
'volatile short's sequentially it reads
a single 32-bit longword. This may crash
e.g., when accessing a memory-mapped device
which allows only 16-bit accesses.

If I compile this code fragment

void volarrcpy(short *d, volatile short *s, int n)
{
int i;
 for (i=0; i  

I can't reproduce this with "GCC: (GNU) 4.3.3 20081110 (prerelease)"

.L8:
   movzwl  (%ecx), %eax
   addl$1, %ebx
   addl$2, %ecx
   movw%ax, (%edx)
   addl$2, %edx
   cmpl%ebx, 16(%ebp)
   jg  .L8

I think you should upgrade.

Andrew.



OK, try this then:

void
c(char *d, volatile char *s)
{
int i;
  for ( i=0; i<32; i++ )
  d[i]=s[i];
}


(gcc --version: gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3)
  

  ^

That may be too old.  Gcc 4.3.4 revision 148680
generates:

.L5:
leaq(%rsi,%rdx), %rax
movzbl  (%rax), %eax
movb%al, (%rdi,%rdx)
addq$1, %rdx
cmpq$32, %rdx
jne .L5



4.4.0 20090307 generates truly bizarre code, though:

gcc -m32 -c -S -O3  x.c

c:
pushl   %ebp
movl%esp, %ebp
pushl   %ebx
movl12(%ebp), %edx
movl8(%ebp), %ebx
movl%edx, %ecx
orl %ebx, %ecx
andl$3, %ecx
leal4(%ebx), %eax
je  .L10
.L2:
xorl%eax, %eax
.p2align 4,,7
.p2align 3
.L5:
leal(%edx,%eax), %ecx
movzbl  (%ecx), %ecx
movb%cl, (%ebx,%eax)
addl$1, %eax
cmpl$32, %eax
jne .L5
popl%ebx
popl%ebp
ret
.p2align 4,,7
.p2align 3
.L10:
leal4(%edx), %ecx
cmpl%ecx, %ebx
jbe .L11
.L7:
movzbl  (%edx), %ecx
movl(%edx), %ecx
movl%ecx, (%ebx)
movzbl  1(%edx), %ecx
movl4(%edx), %ecx
movl%ecx, 4(%ebx)
movzbl  2(%edx), %ecx
movl8(%edx), %ecx
movl%ecx, 4(%eax)
movzbl  3(%edx), %ecx
movl12(%edx), %ecx
movl%ecx, 8(%eax)
movzbl  4(%edx), %ecx
movl16(%edx), %ecx
movl%ecx, 12(%eax)
movzbl  5(%edx), %ecx
movl20(%edx), %ecx
movl%ecx, 16(%eax)
movzbl  6(%edx), %ebx
leal24(%edx), %ecx
movl24(%edx), %ebx
movl%ebx, 20(%eax)
movzbl  7(%edx), %edx
movl4(%ecx), %edx
movl%edx, 24(%eax)
popl%ebx
popl%ebp
ret
.p2align 4,,7
.p2align 3
.L11:
cmpl%edx, %eax
jae .L2
jmp .L7
  




gcc-4.3.0/ppc32 inline assembly produces bad code

2008-03-26 Thread Till Straumann

As of gcc-4.3.0 one of my drivers stopped working.
I distilled the code down to a few lines (see below).

The output of

   gcc-4.3.0 -S -O2 -c tst.c

and

   gcc-4.2.3 -S -O2 -c tst.c

is attached. The code generated by gcc-4.2.3 is as expected.
However, gcc-4.3.0 does not produce a correct result.

Is my inline assembly wrong or is this a gcc bug ?

(If the memory input operand to the inline assembly
statement is omitted then everything is OK).

Please CC-me on any replies; I'm not subscribed to the gcc list.

WKR
-- Till

###
Test case; C-code:
###

/* Powerpc I/O barrier instruction */
#define EIEIO(pmem) do { asm volatile("eieio":"=m"(*pmem):"m"(*pmem)); } 
while (0)



/*
* Reset a bit in a register and poll for it to be set by the hardware
*/
void
test(volatile unsigned *base)
{
   volatile unsigned *reg_p = base + IEVENT_REG/sizeof(*base);
   unsigned   val;

   /* Clear bit in register by writing a one */
   *reg_p = IEVENT_GRSC;
   EIEIO( reg_p );

   /* Poll bit until set by hardware */
   do {
   val = *reg_p;
   EIEIO( reg_p );
   } while ( ! (val & IEVENT_GRSC) );
}

#
Assembly code produced by gcc-4.3.0:
#

   .file   "tst.c"
   .gnu_attribute 4, 1
   .gnu_attribute 8, 1
   .section".text"
   .align 2
   .globl test
   .type   test, @function
test:
   li 0,256
   mr 9,3
/*** ^ NOTE: base address copied into R9 ***/
   stw 0,16(3)
# 21 "c.c" 1
   eieio
# 0 "" 2
.L2:
   lwz 0,0(9)
/*** ^ BAD: R0 = *base instead of *reg_p ***/
# 26 "c.c" 1
   eieio
# 0 "" 2
   andi. 11,0,256
   beq+ 0,.L2
   blr
   .size   test, .-test

###
Assembly code produced by gcc-4.2.3:
###

   .file   "tst.c"
   .section".text"
   .align 2
   .globl test
   .type   test, @function
test:
   li 0,256
   addi 9,3,16
/*** NOTE: R9 = base +  ***/
   stw 0,16(3)
   eieio
.L2:
   lwz 0,0(9)
/*** GOOD: R0 = *reg_p ***/
   eieio
   andi. 11,0,256
   beq+ 0,.L2
   blr
   .size   test, .-test
   .ident  "GCC: (GNU) 4.2.3"



Re: gcc-4.3.0/ppc32 inline assembly produces bad code

2008-03-26 Thread Till Straumann

Daniel Jacobowitz wrote:

On Wed, Mar 26, 2008 at 09:25:05AM -0700, Till Straumann wrote:

Is my inline assembly wrong or is this a gcc bug ?


Your inline assembly seems wrong.

I'm not yet convinced about that...



/* Powerpc I/O barrier instruction */
#define EIEIO(pmem) do { asm volatile("eieio":"=m"(*pmem):"m"(*pmem)); }  
while (0)


An output memory doesn't mean what you think.

How do I tell gcc that the asm modifies a certain area of memory
w/o adding all memory to the clobber list? I thought the output
memory operand did just that.

  I suspect GCC gave you
an input memory operand as "%r0(%r9)" 

Hmm you mean it gave me 0(%r9) in %r0 ? That would still be wrong
because I asked for *reg_p which would be 16(%r9). Plus, I thought
gcc would do that if the constraint was "r" not "m".

and an output memory operand as
"%r9",

In any case, bad code is also produced by gcc-4.3.0 if I omit the
memory output operand and use an example that comes pretty
close to what is in the gcc info page (example illustrating
a memory input in the 'extended asm' section):

**
C-Code
**

#define IEVENT_REG  0x010
#define IEVENT_GRSC (1<<8)

void
test(volatile unsigned *base)
{
   volatile unsigned *reg_p = base + IEVENT_REG/sizeof(*base);
   unsigned   val;

   /* tell gcc that the asm needs/looks at *reg_p */
   asm volatile ("lwz %0, 16(%1)":"=r"(val):"b"(base),"m"(*reg_p));

   while ( ! (val & IEVENT_GRSC) )
   val = *reg_p;
   ;
}

***
assembly produced by gcc-4.3.0
***

   .file   "tst.c"
   .gnu_attribute 4, 1
   .gnu_attribute 8, 1
   .section".text"
   .align 2
   .globl test
   .type   test, @function
test:
   mr 9,3
# 13 "b.c" 1
   lwz 3, 16(3)
# 0 "" 2
   andi. 0,3,256
   bnelr- 0
.L5:
   lwz 0,0(9)  /* BAD: R0 = *base instead of R0 = *reg_p */
   andi. 11,0,256
   beq+ 0,.L5
   blr
   .size   test, .-test
   .ident  "GCC: (GNU) 4.3.0"


assembly produced by gcc-4.2.3


   .file   "tst.c"
   .section".text"
   .align 2
   .globl test
   .type   test, @function
test:
   addi 9,3,16
   lwz 3, 16(3)
   andi. 0,3,256
   bnelr- 0
.L5:
   lwz 0,0(9)   /* GOOD R0 = *reg_p */
   andi. 11,0,256
   beq+ 0,.L5
   blr
   .size   test, .-test
   .ident  "GCC: (GNU) 4.2.3"


 and expected the asm to do what it said it would do with its
operands.

Which doesn't make much sense... but there you go.

Try clobbering it instead, but you don't even need to since the
pointer is already volatile.  asm volatile ("eieio") should work fine.

Yes but there still seems to be a problem (see example in
this message)

-- Till.



Re: gcc-4.3.0/ppc32 inline assembly produces bad code

2008-03-31 Thread Till Straumann

Andreas Schwab wrote:

Till Straumann <[EMAIL PROTECTED]> writes:

  

/* Powerpc I/O barrier instruction */
#define EIEIO(pmem) do { asm volatile("eieio":"=m"(*pmem):"m"(*pmem)); }
while (0)



Looking closer, your asm statement has a bug.  The "m" constraint can
match memory addresses with side effects (auto inc/dec), but the insn
does not carry out that side effect.

I'm sorry - that's a bit too brief for me to understand. Are you
talking about the memory-input or memory-output operand?
Could you please elaborate a little bit?

Also, did you look at my other example?

void
test(volatile unsigned *base)
{
  volatile unsigned *reg_p = base + IEVENT_REG/sizeof(*base);
  unsigned   val;

  /* tell gcc that the asm needs/looks at *reg_p */
  asm volatile ("lwz %0, 16(%1)":"=r"(val):"b"(base),"m"(*reg_p));

  while ( ! (val & IEVENT_GRSC) )
  val = *reg_p;
  ;
}

Thanks
-- Till



  On powerpc the side effect must be
encoded through the update form of the load/store insns.  If you don't
use a load or store insn with the operand the you must use the "o"
constraint to avoid the side effect.

Andreas.

  





Re: gcc-4.3.0/ppc32 inline assembly produces bad code

2008-03-31 Thread Till Straumann

Andreas Schwab wrote:

Daniel Jacobowitz <[EMAIL PROTECTED]> writes:

  

On Mon, Mar 31, 2008 at 03:06:24PM +0200, Andreas Schwab wrote:


The side effect is carried out by using %U0, which expands to u for a
PRE_{INC,DEC,MODIFY} operand.  There is no way to encode that in the
insn operand itself, unlike m68k, for example.  The ia64 target has a
similar issue.
  

OK, so it's possible to get that right.  Still - how many people
writing inline assembly do we think will do so?

This is back to something the S/390 maintainers were working on a few
months ago; in short the useful definition of "m" to GCC is not the
useful one for users, I don't think.  Especially when it changes.



There are many pitfalls when you use inline asm.  Another example is the
difference between "r" and "b" on powerpc, where you can silently get
bad code if you use "r" when "b" would be needed (in some insns 0 means
constant 0 instead of register 0).
  

Sure - but that is at least (minimally) documented. One sentence
more in the gcc manual wouldn't hurt though...

T.

Andreas.

  





Re: gcc-4.3.0/ppc32 inline assembly produces bad code

2008-03-31 Thread Till Straumann

Andreas Schwab wrote:

Till Straumann <[EMAIL PROTECTED]> writes:

  

  asm volatile ("lwz %0, 16(%1)":"=r"(val):"b"(base),"m"(*reg_p));



asm volatile ("lwz%U1%X1 %0, %1":"=r"(val):"m"(*reg_p));
  

Hmm - that is beyond me. What exactly do %U1 and %X1 mean?
I suspect that %U1 means that operand #1 is to be updated, right?

How am I supposed to know that my version is wrong
(and it used to work with older gccs)? I just followed the manual
which states:

> If your assembler instructions access memory in an unpredictable
>fashion, add `memory' to the list of clobbered registers.  This will
>cause GCC to not keep memory values cached in registers across the
>assembler instruction and not optimize stores or loads to that memory.
>You will also want to add the `volatile' keyword if the memory affected
>is not listed in the inputs or outputs of the `asm', as the `memory'
>clobber does not count as a side-effect of the `asm'.  If you know how
>large the accessed memory is, you can add it as input or output but if
>this is not known, you should add `memory'.  As an example, if you
>access ten bytes of a string, you can use a memory input like:
>
> {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}.
>
> Note that in the following example the memory input is necessary,
>otherwise GCC might optimize the store to `x' away:
> int foo ()
> {
>   int x = 42;
>   int *y = &x;
>   int result;
>   asm ("magic stuff accessing an 'int' pointed to by '%1'"
> "=&d" (r) : "a" (y), "m" (*y));
>   return result;
> }


IMHO my example is pretty close to this...

Also, the manual doesn't say that 'm' can only match
memory addresses with side effects; it says 'a memory
operand is allowed, with any kind of address that the
machine supports in general'.

Thanks

-- Till


Andreas.

  





inline assembly question (memory side-effects)

2008-05-10 Thread Till Straumann

Hi.

What is the proper way to tell gcc that a
inline assembly statement either modifies
a particular area of memory or needs it
to be updated/in-sync because the assembly
reads from it.

E.g., assume I have a

struct blah {
   int sum;
 ...
};

which is accessed by my assembly code.

int my_chksum(struct blah *blah_p)
{
  blah_p->sum = 0;
  /* assembly code computes checksum, writes to blah_p->sum */
  asm volatile("..."::"b"(blah_p));
  return blah_p->sum;
}

How can I tell gcc that the object *blah_p
is accessed (read and/or written to) by the
asm (without declaring the object volatile or
using a general "memory" clobber).
(If I don't take any measures then gcc won't know
that blah_p->sum is modified by the assembly and
will optimize the load operation away and return
always 0)

A while ago (see this thread
http://gcc.gnu.org/ml/gcc/2008-03/msg00976.html)
I was told that simply adding the affected
memory area in question as memory ('m') output-
or input-operands is not appropriate -- note that
this is in contradiction to what the gcc info page
suggests (section about 'extended asm':

 "If you know how large the accessed memory
  is, you can add it as input or output..."

along with an example).

Could somebody please clarify (please don't suggest how
this particular example could be improved but try to
answer the fundamental question) ?

Thanks
-- Till

PS: please CC me as I'm not subscribed to the gcc list