AVR byte swap optimization

2006-11-17 Thread Shaun Jackman

The following macro expands to some rather frightful code on the AVR:

#define BSWAP_16(x) \
x) >> 8) & 0xff) | (((x) & 0xff) << 8))

uint16_t bswap_16(uint16_t x)
{
  0:9c 01   movwr18, r24
  2:89 2f   mov r24, r25
  4:99 27   eor r25, r25
  6:32 2f   mov r19, r18
  8:22 27   eor r18, r18
return BSWAP_16(x);
}
  a:82 2b   or  r24, r18
  c:93 2b   or  r25, r19
  e:08 95   ret

Ideally, this macro would expand to three mov instructions and a ret.
Is there anything I can do to help GCC along here? I'm using GCC 4.1.0
with -O2.

I won't bother to show bswap_32 here, which produces a real disaster!
Think 47 instructions, for what should be 6.

Cheers,
Shaun

$ avr-gcc --version |head -1
avr-gcc (GCC) 4.1.0


GCC not finding target as

2006-11-18 Thread Shaun Jackman

The GCC build (svn revision 118976) is not finding the target as,
although it appears to be in the right place.

Thanks,
Shaun

$ ../configure --target=avr --enable-languages=c --prefix=/usr
...
$ cat gcc/as
#!/bin/sh
exec  "$@"
$ /usr/bin/avr-as --version | head -1
GNU assembler 2.16.1
$ make
...
make[3]: Entering directory
`/home/sjackman/src/toolchain/gcc-svn/_build-avr/gcc'
/home/sjackman/src/toolchain/gcc-svn/_build-avr/./gcc/xgcc
-B/home/sjackman/src/toolchain/gcc-svn/_build-avr/./gcc/
-B/usr/avr/bin/ -B/usr/avr/lib/ -isystem /usr/avr/include -isystem
/usr/avr/sys-include -O2  -O2 -g -O2  -DIN_GCC -DCROSS_COMPILE   -W
-Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes
-Wold-style-definition  -isystem ./include  -DDF=SF -Dinhibit_libc
-mcall-prologues -Os -g  -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED
-Dinhibit_libc -I. -I. -I../../gcc -I../../gcc/.
-I../../gcc/../include -I../../gcc/../libcpp/include
-I../../gcc/../libdecnumber -I../libdecnumber -DL_mulqi3
-xassembler-with-cpp -c ../../gcc/config/avr/libgcc.S -o
libgcc/./_mulqi3.o
/home/sjackman/src/toolchain/gcc-svn/_build-avr/./gcc/as: line 2:
exec: -o: invalid option
exec: usage: exec [-cl] [-a name] file [redirection ...]
make[3]: *** [libgcc/./_mulqi3.o] Error 1
make[3]: Leaving directory `/home/sjackman/src/toolchain/gcc-svn/_build-avr/gcc'
make[2]: *** [stmp-multilib] Error 2
make[2]: Leaving directory `/home/sjackman/src/toolchain/gcc-svn/_build-avr/gcc'
make[1]: *** [all-gcc] Error 2
make[1]: Leaving directory `/home/sjackman/src/toolchain/gcc-svn/_build-avr'
make: *** [all] Error 2


Optimizing a 16-bit * 8-bit -> 24-bit multiplication

2006-12-01 Thread Shaun Jackman

I would like to multiply a 16-bit number by an 8-bit number and
produce a 24-bit result on the AVR. The AVR has a hardware 8-bit *
8-bit -> 16-bit multiplier.

If I multiply a 16-bit number by a 16-bit number, it produces a 16-bit
result, which isn't wide enough to hold the result.

If I cast one of the operands to 32-bit and multiply a 32-bit number
by a 16-bit number, GCC generates a call to __mulsi3, which is the
routine to multiply a 32-bit number by a 32-bit number and produce a
32-bit result and requires ten 8-bit * 8-bit multiplications.

A 16-bit * 8-bit -> 24-bit multiplication only requires two 8-bit *
8-bit multiplications. A 16-bit * 16-bit -> 32-bit multiplication
requires four 8-bit * 8-bit multiplications.

I could write a mul24_16_8 (16-bit * 8-bit -> 24-bit) function using
unions and 8-bit * 8-bit -> 16-bit multiplications, but before I go
down that path, is there any way to coerce GCC into generating the
code I desire?

Cheers,
Shaun


Bizarre inlining type promotion effect

2006-12-04 Thread Shaun Jackman

In the code snippet below, the function mul_8_8 compiles to use
exactly one `mul' instruction on the AVR. The function mul_16_8 calls
mul_8_8 twice. If mul_8_8 is a static inline function and inlined in
mul_16_8, each call generates three `mul' instructions! Why does
inlining mul_8_8 cause each 8x8 multiplication to be promoted to a
16x16 multiplication?

It seems that the inlining mechanism has a real bug if inlining can
cause such a major change in the code generated for a given function.

Cheers,
Shaun

$ avr-gcc --version |head -1
avr-gcc (GCC) 4.1.0
$ cat mul.c
#include 

static uint16_t mul_8_8(uint8_t a, uint8_t b)
{
return a * b;
}

uint32_t mul_16_8(uint16_t a, uint8_t b)
{
uint8_t a0 = a, a1 = a >> 8;
return ((uint32_t)mul_8_8(a1, b) << 8) + mul_8_8(a0, b);
}
$ avr-gcc -c -g -O2 -mmcu=avr4 mul.c
$ avr-objdump -d mul.o

mul.o: file format elf32-avr

Disassembly of section .text:

 :
  0:86 9f   mul r24, r22
  2:c0 01   movwr24, r0
  4:11 24   eor r1, r1
  6:08 95   ret

0008 :
  8:bf 92   pushr11
  a:cf 92   pushr12
  c:df 92   pushr13
  e:ef 92   pushr14
 10:ff 92   pushr15
 12:0f 93   pushr16
 14:1f 93   pushr17
 16:6c 01   movwr12, r24
 18:b6 2e   mov r11, r22
 1a:8d 2d   mov r24, r13
 1c:99 27   eor r25, r25
 1e:f0 df   rcall   .-32; 0x0 
 20:7c 01   movwr14, r24
 22:00 27   eor r16, r16
 24:11 27   eor r17, r17
 26:10 2f   mov r17, r16
 28:0f 2d   mov r16, r15
 2a:fe 2c   mov r15, r14
 2c:ee 24   eor r14, r14
 2e:6b 2d   mov r22, r11
 30:8c 2d   mov r24, r12
 32:e6 df   rcall   .-52; 0x0 
 34:aa 27   eor r26, r26
 36:bb 27   eor r27, r27
 38:e8 0e   add r14, r24
 3a:f9 1e   adc r15, r25
 3c:0a 1f   adc r16, r26
 3e:1b 1f   adc r17, r27
 40:c8 01   movwr24, r16
 42:b7 01   movwr22, r14
 44:1f 91   pop r17
 46:0f 91   pop r16
 48:ff 90   pop r15
 4a:ef 90   pop r14
 4c:df 90   pop r13
 4e:cf 90   pop r12
 50:bf 90   pop r11
 52:08 95   ret
$ sed -i 's/static/& inline/' mul.c
$ avr-gcc -c -g -O2 -mmcu=avr4 mul.c
$ avr-objdump -d mul.o

mul.o: file format elf32-avr

Disassembly of section .text:

 :
  0:ac 01   movwr20, r24
  2:26 2f   mov r18, r22
  4:33 27   eor r19, r19
  6:89 2f   mov r24, r25
  8:99 27   eor r25, r25
  a:82 9f   mul r24, r18
  c:b0 01   movwr22, r0
  e:83 9f   mul r24, r19
 10:70 0d   add r23, r0
 12:92 9f   mul r25, r18
 14:70 0d   add r23, r0
 16:11 24   eor r1, r1
 18:88 27   eor r24, r24
 1a:99 27   eor r25, r25
 1c:98 2f   mov r25, r24
 1e:87 2f   mov r24, r23
 20:76 2f   mov r23, r22
 22:66 27   eor r22, r22
 24:55 27   eor r21, r21
 26:f9 01   movwr30, r18
 28:e4 9f   mul r30, r20
 2a:90 01   movwr18, r0
 2c:e5 9f   mul r30, r21
 2e:30 0d   add r19, r0
 30:f4 9f   mul r31, r20
 32:30 0d   add r19, r0
 34:11 24   eor r1, r1
 36:44 27   eor r20, r20
 38:55 27   eor r21, r21
 3a:62 0f   add r22, r18
 3c:73 1f   adc r23, r19
 3e:84 1f   adc r24, r20
 40:95 1f   adc r25, r21
 42:08 95   ret


Re: Bizarre inlining type promotion effect

2006-12-04 Thread Shaun Jackman

On 12/4/06, Shaun Jackman <[EMAIL PROTECTED]> wrote:

In the code snippet below, the function mul_8_8 compiles to use
exactly one `mul' instruction on the AVR. The function mul_16_8 calls
mul_8_8 twice. If mul_8_8 is a static inline function and inlined in

...

For comparison, a hand-coded 16x8 multiply function requires 11 instructions.

Cheers,
Shaun

mul_16_8:
mul r25, r22
mov r23, r0
mov r25, r1
mul r24, r22
eor r24, r24
mov r22, r0
add r23, r1
adc r24, r25
eor r25, r25
eor r1, r1
ret


Re: AVR byte swap optimization

2006-12-18 Thread Shaun Jackman

On 11/26/06, Denis Vlasenko <[EMAIL PROTECTED]> wrote:

On Saturday 18 November 2006 00:30, Shaun Jackman wrote:
> The following macro expands to some rather frightful code on the AVR:
>
> #define BSWAP_16(x) \
>  x) >> 8) & 0xff) | (((x) & 0xff) << 8))

Sometimes gcc is generating better code if you cast
values instead of masking. Try:

  ( (uint8_t)((x) >> 8) | ((uint8_t)(x)) << 8 )


Your suggestion seemed like a good one and gave me some hope.
Unfortunately, it produces identical results to my BSWAP_16 macro. I
also tried the following:

uint8_t a = x >> 8, b = x;
return b << 8 | a;

Different register allocations this time, but identical instructions.

Cheers,
Shaun


Re: [avr-gcc-list] Re: AVR byte swap optimization

2006-12-18 Thread Shaun Jackman

On 12/18/06, Anton Erasmus <[EMAIL PROTECTED]> wrote:

Hi,

Not a macro, but the following seems to generate reasonable code.

...

Thanks Anton,

I came to the same conclusion.

Cheers,
Shaun

static inline uint16_t bswap_16_inline(uint16_t x)
{
union {
uint16_t x;
struct {
uint8_t a;
uint8_t b;
} s;
} in, out;
in.x = x;
out.s.a = in.s.b;
out.s.b = in.s.a;
return out.x;
}

static inline uint32_t bswap_32_inline(uint32_t x)
{
union {
uint32_t x;
struct {
uint8_t a;
uint8_t b;
uint8_t c;
uint8_t d;
} s;
} in, out;
in.x = x;
out.s.a = in.s.d;
out.s.b = in.s.c;
out.s.c = in.s.b;
out.s.d = in.s.a;
return out.x;
}


Re: [avr-gcc-list] Re: AVR byte swap optimization

2006-12-18 Thread Shaun Jackman

On 12/18/06, David VanHorn <[EMAIL PROTECTED]> wrote:

Am I missing something here?
Why not pop to assembler, push the high, push the low, pop the high, pop the
low?


* Inline assembler cannot be used at compile time, for example to
initialize a static variable.

* If the swap function is called on a constant, the compiler cannot
remove the inline assembler. In general, any inline assembler tends to
handcuff the optimizer to some degree.

* Push and pop take two cycles since they access memory. Three mov
instructions are faster than one push and one pop.

Cheers,
Shaun


Re: [avr-gcc-list] Re: AVR byte swap optimization

2006-12-18 Thread Shaun Jackman

On 12/18/06, Nils Springob <[EMAIL PROTECTED]> wrote:

Hi,

and it is possible to use an anonymous struct:


True. However, unnamed struct/union fields are an extension of the C
language provided by GCC and not strictly portable. It is a nice
feature though. It would be nice if it crept its way into a future C
standard.

Cheers,
Shaun


Re: Bug#300945: romeo: FTBFS (amd64/gcc-4.0): invalid lvalue in assignment

2005-04-02 Thread Shaun Jackman
Can the GCC "C Extension" of "Generalized Lvalues" be enabled with a
switch in gcc-4.0?

Cheers,
Shaun

On Mar 22, 2005 12:29 PM, Andreas Jochens <[EMAIL PROTECTED]> wrote:
> Package: romeo
> Severity: normal
> Tags: patch
> 
> When building 'romeo' on amd64 with gcc-4.0,
> I get the following error:
> 
> assemble.c:581: error: invalid lvalue in assignment
> assemble.c: In function 'MergeIfAdjacent':
> assemble.c:645: error: invalid lvalue in assignment
> assemble.c:647: error: invalid lvalue in assignment
> assemble.c:649: error: invalid lvalue in assignment
> assemble.c: In function 'ROMfree':
> assemble.c:710: error: invalid lvalue in assignment
> assemble.c:720: error: invalid lvalue in assignment
> assemble.c:744: error: invalid lvalue in assignment
> assemble.c:745: error: invalid lvalue in assignment
> assemble.c: In function 'CompareTypeCtor':
> assemble.c:1420: warning: pointer targets in passing argument 1 of 'strncmp' 
> differ in signedness
> assemble.c:1420: warning: pointer targets in passing argument 2 of 'strncmp' 
> differ in signedness
> assemble.c: In function 'SetSystem':
> assemble.c:1504: warning: pointer targets in initialization differ in 
> signedness
> assemble.c:1597: warning: pointer targets in passing argument 1 of 'strlen' 
> differ in signedness
> assemble.c:1606: warning: pointer targets in passing argument 1 of 'strlen' 
> differ in signedness
> make[1]: *** [assemble.o] Error 1
> make[1]: Leaving directory `/romeo-0.5.0'
> make: *** [build-stamp] Error 2
> 
> With the attached patch 'romeo' can be compiled
> on amd64 using gcc-4.0.
> 
> Regards
> Andreas Jochens
[clip patch]


Statement expression with function-call variable lifetime

2005-06-29 Thread Shaun Jackman
Hello,

I'm implementing a tiny vfork/exit implementation using setjmp and
longjmp. Since the function calling setjmp can't return (if you still
want to longjmp to its jmp_buf) I implemented vfork using a statement
expression macro. Here's my implementation of vfork.

jmp_buf *vfork_jmp_buf;
#define vfork() ({ \
int setjmp_ret; \
jmp_buf *prev_jmp_buf = vfork_jmp_buf, new_jmp_buf; \
vfork_jmp_buf = &new_jmp_buf; \
if( (setjmp_ret = setjmp(*vfork_jmp_buf)) != 0 ) \
vfork_jmp_buf = prev_jmp_buf; \
setjmp_ret; \
})

Unfortunately, after tracing a nasty bug I found that the same problem
applies to leaving a statement expression as does returning from a
function. The storage allocated for prev_jmp_buf and new_jmp_buf is
unallocated as soon as we leave the scope of the statement expression.
gcc quickly reuses that stack space later in the same function,
overwriting the saved jmp_buf. Does anyone have a suggestion how I can
allocate some storage space here for prev_jmp_buf and new_jmp_buf that
will last the lifetime of the function call instead of the lifetime of
the statement expression macro? My best idea was to use alloca, but it
wouldn't look pretty. Can someone confirm that memory allocated with
alloca would last the lifetime of the function call, and not the
lifetime of the statement expression?

Please cc me in your reply. Thanks,
Shaun


Re: Statement expression with function-call variable lifetime

2005-06-29 Thread Shaun Jackman
On 6/29/05, Daniel Jacobowitz <[EMAIL PROTECTED]> wrote:
> On Wed, Jun 29, 2005 at 10:34:20AM -0700, Shaun Jackman wrote:
> > the statement expression macro? My best idea was to use alloca, but it
> > wouldn't look pretty. Can someone confirm that memory allocated with
> > alloca would last the lifetime of the function call, and not the
> > lifetime of the statement expression?
> 
> Yes, that's correct (and the only way to do this).

Great! I've rewritten the vfork macro to use alloca. Here's the big
catch, the pointer returned by alloca is stored on the stack, and it
gets trashed upon leaving the statement expression. Where can I store
the pointer returned by alloca? Please don't say allocate some memory
for the pointer using alloca.  My best idea so far was to
make the pointer returned by alloca (struct context *c, see below)
static, so that each time the macro is invoked, a little static memory
is put aside for it. I think this would work, but it doesn't seem to
me to be the best solution.

Here's the revised vfork macro that uses alloca:

jmp_buf *vfork_jmp_buf;

struct context {
jmp_buf *prev_jmp_buf;
jmp_buf new_jmp_buf;
};

# define vfork() ({ \
int setjmp_ret; \
struct context *c = alloca(sizeof *c); \
c->prev_jmp_buf = vfork_jmp_buf; \
vfork_jmp_buf = &c->new_jmp_buf; \
if( (setjmp_ret = setjmp(*vfork_jmp_buf)) != 0 ) \
vfork_jmp_buf = c->prev_jmp_buf; \
setjmp_ret; \
})

Cheers,
Shaun


Re: Statement expression with function-call variable lifetime

2005-06-29 Thread Shaun Jackman
On 6/29/05, Russell Shaw <[EMAIL PROTECTED]> wrote:
> Alloca is like creating a stack variable, except it just gives you some
> generic bytes that don't mean anything. Exiting the local scope will
> trash the local variables and anything done with alloca(). You'll need
> to store some information in a global variable. This C exceptions method
> stores things as a linked list in nested stack frames and keeps a pointer
> to the list in a global variable. Well worth studying:
> 
>http://ldeniau.home.cern.ch/ldeniau/html/exception/exception.html

Thanks! That linked was a very good reference. Here's the new and
improved vfork macro. It works nicely! If anyone's curious enough, I
can post the exit and waitpid macros that accompany it. I'm using
these macros to implement a Busybox system that does not need a fork
system call.

Thanks for all your help, Daniel and Russell! Cheers,
Shaun

struct vfork_context {
struct vfork_context *prev;
jmp_buf jmp;
} *vfork_context;

# define vfork() ({ \
int setjmp_ret; \
struct vfork_context *c = alloca(sizeof *c); \
c->prev = vfork_context; \
vfork_context = c; \
if( (setjmp_ret = setjmp(c->jmp)) != 0 ) \
vfork_context = vfork_context->prev; \
setjmp_ret; \
})


memcpy to an unaligned address

2005-08-02 Thread Shaun Jackman
In a typical Ethernet/IP ARP header the source IP address is
unaligned. Instead of using...
out->srcIPAddr = in->dstIPAddr;
... I used...
memcpy(&out->srcIPAddr, &in->dstIPAddr, sizeof(uint32_t));
... to account for the unaligned destination. This worked until gcc 4,
which now generates a simple load/store.
ldr r3, [r6, #24]
addsr2, r4, #0
addsr2, #14
str r3, [r2, #0]
A nice optimisation, but in this case it's incorrect. $r4 is aligned,
and the result of adding #14 to $r4 is an unaligned pointer.

Should gcc know better, or do I need to give it a little more
information to help it out?

Please cc me in your reply. Cheers,
Shaun


Re: memcpy to an unaligned address

2005-08-02 Thread Shaun Jackman
On 8/2/05, Dave Korn <[EMAIL PROTECTED]> wrote:
>   In order for anyone to answer your questions about the alignment of
> various types in a struct, don't you think you should perhaps have told us a
> little about what those types actually are and how the struct is laid out?

Of course, my apologies. I was clearly overly terse. I declare the
structure packed as follows:

typedef struct {
uint16_t a;
uint32_t b;
} __attribute__((packed)) st;

void foo(st *s, int n)
{
memcpy(&s->b, &n, sizeof n);
}

This code generates the unaligend store:
$ arm-elf-objdump -d packed.o
...
   0:   e24dd004sub sp, sp, #4  ; 0x4
   4:   e5801002str r1, [r0, #2]
   8:   e28dd004add sp, sp, #4  ; 0x4
   c:   e12fff1ebx  lr
$ arm-elf-gcc --version | head -1
arm-elf-gcc (GCC) 4.0.1

Cheers,
Shaun


Re: memcpy to an unaligned address

2005-08-02 Thread Shaun Jackman
On 8/2/05, Paul Koning <[EMAIL PROTECTED]> wrote:
> One of the things that continues to baffle me (and my colleagues) is
> the bizarre way in which attributes such as "packed" work when applied
> to structs.
> 
> It would be natural to assume, as Shaun did, that marking a struct
> "packed" (or, for that matter, "packed,aligned(2)") would apply that
> attribute to the fields of the struct.

This is exactly the behaviour suggested by the info docs:

$ info gcc 'C Ext' 'Type Attr'
...
 Specifying this attribute for `struct' and `union' types is
 equivalent to specifying the `packed' attribute on each of the
 structure or union members.

Cheers,
Shaun


Re: memcpy to an unaligned address

2005-08-02 Thread Shaun Jackman
On 8/2/05, Dave Korn <[EMAIL PROTECTED]> wrote:
>   There are two separate issues here:
> 
> 1)  Is the base of the struct aligned to the natural alignment, or can the
> struct be based at any address

The base of the struct is aligned to the natural alignment, four bytes
in this case.
 
> 2)  Is there padding between the struct members to maintain their natural
> alignments (on the assumption that the struct's base address is aligned.)

There is no padding. The structure is defined as
__attribute__((packed)) to explicitly remove the padding. The result
is that gcc knows the unaligned four byte member is at an offset of
two bytes from the base of the struct, but uses a four byte load at
the unaligned address of base+2. I don't expect...
p->unaligned = n;
... to work, but I definitely expect
memcpy(&p->unaligned, &n, sizeof p->unaligned);
to work. The second case is being optimised to the first case though
and generating and unaligned store.

Cheers,
Shaun


Re: memcpy to an unaligned address

2005-08-02 Thread Shaun Jackman
On 8/2/05, Joe Buck <[EMAIL PROTECTED]> wrote:
> I suppose we could make & on an unaligned project return a void*.  That
> isn't really right, but it would at least prevent the cases that we know
> don't work from compiling.

That sounds like a dangerous idea only because I'd expect...
int *p = &packed_struct.unaligned_member;
... to fail if unaligned_member is not an int, but if the & operator
returns a void*, it would suddenly become very permissive.

Cheers,
Shaun


Re: memcpy to an unaligned address

2005-08-02 Thread Shaun Jackman
On 8/2/05, Paul Koning <[EMAIL PROTECTED]> wrote:
> It sounds like the workaround is to avoid memcpy, and just use
> variable assignment.  Alternatively, cast the pointers to char*, which
> should force memcpy to do the right thing.  Ugh.

I swear originally, back in the gcc 2.95 days, I used memcpy because
the memcpy function checked for unaligned pointers, whereas storing to
and loading from unaligned variables generated a simple store/load
instruction which wouldn't work. It seems the tables have turned and
the exact opposite is true now with gcc 4, where memcpy doesn't work,
but unaligned variables do. I believe gcc 3 behaved the same as gcc 2
-- memcpy worked, unaligned variables didn't work. Can someone confirm
this summary is correct?

It seems to me there's an argument for a _memcpy_unaligned(3)
function, as ugly as that is.

Cheers,
Shaun


Re: memcpy to an unaligned address

2005-08-02 Thread Shaun Jackman
On 8/2/05, Paul Koning <[EMAIL PROTECTED]> wrote:
> It sounds like the workaround is to avoid memcpy, and just use
> variable assignment.  Alternatively, cast the pointers to char*, which
> should force memcpy to do the right thing.  Ugh.

Casting to void* does not work either. gcc keeps the alignment
information -- but not the *unalignment* information, if that
distinction makes any sense -- of a particular variable around as long
as it can, through casts and even through assignment. The unalignment
information, on the other hand, is lost immediately after the &
operator. None of these examples produce an unaligned load:

memcpy(&s->b, &n, sizeof n);

memcpy((void*)&s->b, &n, sizeof n);

void *p = &s->b;
memcpy(p, &n, sizeof n);

But as pointed out by others, this does produce an unaligned load:

s->b = n;

Cheers,
Shaun


Re: memcpy to an unaligned address

2005-08-02 Thread Shaun Jackman
On 8/2/05, Shaun Jackman <[EMAIL PROTECTED]> wrote:
> operator. None of these examples produce an unaligned load:

I should clarify the wording I'm using here. By "an unaligned load" I
mean code to safely load from an unaligned pointer.

Cheers,
Shaun


Re: memcpy to an unaligned address

2005-08-03 Thread Shaun Jackman
On 8/3/05, Richard Henderson <[EMAIL PROTECTED]> wrote:
> It is nevertheless correct.  Examine all of the parts of the expression.
> 
> In particular, "&s->b".  What type does it have?  In an ideal world, it
> would be "pointer to unaligned integer".  But we have no such type in
> our type system, so it is "pointer to integer".  This expression is ONLY
> THEN passed to memcpy.  At which point we query the argument for its
> alignment, and get the non-intuitive result.
> 
> If you instead pass "s" to memcpy, you should get the correct unaligned
> copy.  If that isn't happening, that's a bug.

I'm not sure I understood the last line. s is a structure, and its
address is aligned. How would you pass it to memcpy, and why would it
generate an unaligned copy?

Cheers,
Shaun


Re: memcpy to an unaligned address

2005-08-04 Thread Shaun Jackman
On 8/4/05, Carl Whitwell <[EMAIL PROTECTED]> wrote:
> Hi, 
> thought I'd drop you a mail, would put it on gcc mailing list but
> haven't got time to work out how to send it there at this moment. 

The gcc mailing list is [EMAIL PROTECTED]
   
> All testing here is done on x86 processors using gcc under cygwin. 

Are you using an x86 host and an arm target?

> Testing gcc 3.3.4 showed no problems with memcpy alignment.  Using your
> example, the structure s was aligned but the member variable b was
> unaligned. The assembler code produced for the direct copy s->a = p and the
> memcpy replacement were identical. 

Did it produce an open code memcpy, and was it correct?

> Testing gcc 4.0.1 I found the structure s was unaligned such that the member
> variable b was aligned, which was odd. 
> I discovered that the structure appears to be aligned based upon the natural
> alignment of the last element within that structure. 
> Of course this means that the memcpy is now acting on an aligned member
> rather than an unaligned member, and works perfectly well. 

Interesting. Can you post an assembler snippet of this?

> This means that to cause an unaligned element within the structure I had to
> add another element to the structure and retest. 
> On doing this though I found gcc appeared to correctly replace memcpy with
> an unaligned copy. 
>   
> I completely agree that gcc should be handling all the alignment issues
> here, but I'm not sure what it thinks it's doing moving the structure about
> in 4.0.1 
> It may be worth tracking the addresses of the members in all the tests to
> make sure the tests are comparable across gcc versions. 
>   
> Regards, 
> Carl Whitwell 

Cheers,
Shaun


Re: memcpy to an unaligned address

2005-08-05 Thread Shaun Jackman
On 8/5/05, Carl Whitwell <[EMAIL PROTECTED]> wrote:
> On 8/4/05, Shaun Jackman <[EMAIL PROTECTED]> wrote:
> > Are you using an x86 host and an arm target?
> 
> Actually no, my major concern at the time was the large quantity of
> legacy code with packed structures that we have on an embedded linux
> x86 system. I was just testing that we didn't have an issue there with
> the structure access.

The x86 includes hardware to fetch a word from an unaligned address by
fetching from two aligned address and shifting and combining to
produce the correct result. So, unaligned accesses on the x86 might
effect a slight performance hit, but they will still act correctly.
The ARM on the other hand does not include such hardware, so a load
instruction fetching from an unaligned address will behave
incorrectly. My primary concern is really just code correctness.

Cheers,
Shaun


ICE in first_insn_after_basic_block_note

2005-09-29 Thread Shaun Jackman
The Debian HPPA buildd is failing with an ICE when building SwingWT
for the HPPA.

Here's the full build log:
http://buildd.debian.org/fetch.php?&pkg=swingwt&ver=0.87-2&arch=hppa&stamp=1126289145&file=log&as=raw

Here's the interesting line:
swingwtx/swing/JOptionPane.java:325: internal compiler error: in
first_insn_after_basic_block_note, at flow.c:349

GCJ 4.0.1 is being used.

Cheers,
Shaun


ARM spurious load

2005-12-01 Thread Shaun Jackman
The following code snippet produces code that loads a register, r5,
from memory, but never uses the value. The code is correct though, so
not a major issue. In addition, it never uses r3 or r12, which are
"free" registers, in that they don't have to be saved by the callee.
For a one line function that essentially returns a constant, the
generated code spills a lot of registers that it needn't. The
generated code ought to resemble "ldr r0, =constant0; ldr r1,
=constant1; b lr". This code was generated using arm-elf-gcc -mthumb
-O2.

Please cc me in your reply. Cheers,
Shaun

arm-elf-gcc (GCC) 4.0.2

#define USER_STACK 0x210

static long long rdp_getenv(void)
{
// Return the command line in r0 and the stack in r1.
return ((long long unsigned)USER_STACK << 32) | (int)"";
}

02001644 :
  2001644:  b570push{r4, r5, r6, lr}
  2001646:  4a04ldr r2, [pc, #16]   (2001658 <.text+0x1658>)
  2001648:  4d04ldr r5, [pc, #16]   (200165c <.text+0x165c>)
  200164a:  4e05ldr r6, [pc, #20]   (2001660 <.text+0x1660>)
  200164c:  17d4asrsr4, r2, #31
 200164e:   1c21addsr1, r4, #0
  2001650:  4331orrsr1, r6
 2001652:   1c10addsr0, r2, #0
  2001654:  bd70pop {r4, r5, r6, pc}
[the literal pool follows here]


Disabling the heuristic inliner

2009-11-18 Thread Shaun Jackman
Is it possible to disable the heuristic inline function logic? I would
like the following behaviour:

* static inline functions are always inlined
* non-static functions are never inlined
* static functions that are called once are inlined
* static functions that are called more than once are not inlined

I'm using -Os, and I'm trying to find the right combination of -f
switches to enable this behaviour.

Thanks,
Shaun


Re: Disabling the heuristic inliner

2009-11-18 Thread Shaun Jackman
2009/11/18 Shaun Jackman :
> Is it possible to disable the heuristic inline function logic? I would
> like the following behaviour:
>
> * static inline functions are always inlined
> * non-static functions are never inlined
> * static functions that are called once are inlined
> * static functions that are called more than once are not inlined
>
> I'm using -Os, and I'm trying to find the right combination of -f
> switches to enable this behaviour.
>
> Thanks,
> Shaun

I haven't done a lot of testing, but
-Os -fno-inline-small-functions
seems to accomplish this.

Cheers,
Shaun


avr: optimizing assignment to a bit field

2009-11-28 Thread Shaun Jackman
When assigning a bool to a single bit of a bitfield located in the
bit-addressable region of memory, better code is produced by
if (flag)
bitfield.bit = true;
else
bitfield.bit = false;
than
bitfield.bit = flag;

I've included a short test and the assembler output by both forms.
Should I file a bug suggesting a possible improvement here?

Cheers,
Shaun

#include 
#include 

struct byte { uint8_t x0:1; uint8_t x1:1; uint8_t x2:1; uint8_t x3:1;
uint8_t x4:1; uint8_t x5:1; uint8_t x6:1; uint8_t x7:1; };

volatile struct byte *const porte = (void*)0x23;

void set_flag_good(bool flag)
{
if (flag)
porte->x6 = true;
else
porte->x6 = false;
}

void set_flag_bad(bool flag)
{
porte->x6 = flag;
}


 :
   0:   88 23   and r24, r24
   2:   01 f4   brne.+0 ; 0x4 
2: R_AVR_7_PCREL.text+0x8
   4:   1e 98   cbi 0x03, 6 ; 3
   6:   08 95   ret
   8:   1e 9a   sbi 0x03, 6 ; 3
   a:   08 95   ret

000c :
   c:   81 70   andir24, 0x01   ; 1
   e:   82 95   swapr24
  10:   88 0f   add r24, r24
  12:   88 0f   add r24, r24
  14:   80 7c   andir24, 0xC0   ; 192
  16:   93 b1   in  r25, 0x03   ; 3
  18:   9f 7b   andir25, 0xBF   ; 191
  1a:   98 2b   or  r25, r24
  1c:   93 b9   out 0x03, r25   ; 3
  1e:   08 95   ret


Re: avr: optimizing assignment to a bit field

2009-11-28 Thread Shaun Jackman
2009/11/28 Richard Guenther :
> On Sat, Nov 28, 2009 at 11:43 PM, Shaun Jackman  wrote:
>> When assigning a bool to a single bit of a bitfield located in the
>> bit-addressable region of memory, better code is produced by
>>        if (flag)
>>                bitfield.bit = true;
>>        else
>>                bitfield.bit = false;
>> than
>>        bitfield.bit = flag;
>>
>> I've included a short test and the assembler output by both forms.
>> Should I file a bug suggesting a possible improvement here?
>
> Yes, a bugreport is useful - but there might be a bug with this issue
> already.

Reported here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42210

Cheers,
Shaun


Re: "R_ARM_GOTOFF32" problem on uClinux

2008-12-01 Thread Shaun Jackman
2008/12/1 K.J. Kuan-Jau Lin(林寬釗) <[EMAIL PROTECTED]>:
> Hi Shaun,
>
> I have trouble in uClinux world.
> After long and painful debugging, i found it is due to the "R_ARM_GOTOFF32"
> relocation type.
> By google, i found you got the same problem before.
> (http://mailman.uclinux.org/pipermail/uclinux-dev/2006-June/039089.html).
> I knew you have tried the gcc-4.0.x & gcc-4.1.x toolchains.
> My toolchian is gcc-4.2.4 and seems still have the same problem.
> Besides changing to gcc-4.0.x, did you have any workaround solutions?
> Any hints will be very appreciated.

Hi KJ,

I never did find a solution. Only two cases in particular bit me:

* strings constants
* addresses of static functions passed as arguments

The first I worked around by placing strings constants in the
read-write memory (RODATA after the GOT) rather than the read-only
memory (RODATA after TEXT), although this does use RAM unnecessarily.
I never did fix the second issue.

A possible quick hack, although not a solution, would be to find
wherever GCC decides to emit a GOTOFF32 relocation and use a GOT32
relocation instead.

This issue has a bug report in bugzilla:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28194
I've cc'ed the GCC mailing list.

Cheers,
Shaun


Declaration after a label

2006-03-31 Thread Shaun Jackman
GCC reports an error for this snippet:

int main()
{
foo:
int x;
}

foo.c:4: error: expected expression before 'int'

... but not this snippet:

int main()
{
foo:
(void)0;
int x;
}

Is this expected behaviour? At the very least, it seems like an
unusual distinction.

Please cc me in your reply. Cheers,
Shaun


mthumb: generate a tail-call

2006-05-10 Thread Shaun Jackman

What optimisation option is needed to prod arm-elf-gcc -mthumb to
generate a tail call? ARM works as expected.

Please cc me in your reply. Thanks!
Shaun

arm-elf-gcc (GCC) 4.1.0

$ cat 

The default -mpic-register for -mthumb

2006-06-26 Thread Shaun Jackman

The default -mpic-register (on ARM/Thumb) is r10. In -mthumb mode, r10
is not available to the math instructions as a direct argument. On top
of that, preserving r10 complicates the function prologue. Does it
make more sense to use a directly accessible register, r7 for example,
as the default -mpic-register when in -mthumb mode? Can a
non-preserved register, such as r3, be used instead? (assuming the
compiler saves the function argument in r3 somewhere)

Why is the PIC register fixed? Could the compiler register allocation
logic be allowed to allocate the _GLOBAL_OFFSET_TABLE_ pointer the
same as any other variable? In which case it would probably realise
that
cost(r3) < cost(r7) < cost(r10)
at least in the case where r3 isn't being used by a function argument.

Cheers,
Shaun


Re: The default -mpic-register for -mthumb

2006-06-26 Thread Shaun Jackman

On 6/26/06, Richard Earnshaw <[EMAIL PROTECTED]> wrote:

As of gcc-4.2 it isn't fixed, it's just like any other pseudo generated
by the compiler.


Glad to hear it!

Thanks,
Shaun


Specifying a MULTILIB dependency

2006-06-26 Thread Shaun Jackman

I'm using MULTILIB_OPTIONS and MULTILIB_DIRNAMES to compile a PIC/XIP
toolchain. I'm familiar with the MULTILIB_EXCEPTIONS mechanism to
specify incompatible configurations. How, though, do I indicate that
msingle-pic-base depends on fPIC?

MULTILIB_OPTIONS+= fPIC
MULTILIB_DIRNAMES   += pic
MULTILIB_OPTIONS+= msingle-pic-base
MULTILIB_DIRNAMES   += xip

Please cc me in your reply. Thanks,
Shaun


Testing MULTILIB configurations

2006-06-26 Thread Shaun Jackman

After I've modified the MULTILIB options in t-arm-elf, what's the
fastest way to test the new configuration without rebuilding the
entire toolchain? Right now the best method I have is `make clean-gcc
all-gcc', which is admittedly quite slow.

Please cc me in your reply. Thanks!
Shaun


Re: Specifying a MULTILIB dependency

2006-06-26 Thread Shaun Jackman

On 26 Jun 2006 14:04:36 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> >
The usual hacked up way is to MULTILIB_EXCEPTIONS to remove

-msingle-pic-base without -fPIC.  Something like

MULTILIB_EXCEPTIONS = -msingle-pic-base

might do it.


I tried your suggestion, but it didn't seem to have the desired
effect. With my limited understanding of how MULTILIB_EXCEPTIONS
works, I thought the following attempt might have a chance:

MULTILIB_EXCEPTIONS += *!fPIC*/*msingle-pic-base*

No luck though.

Cheers,
Shaun


Re: Specifying a MULTILIB dependency

2006-06-26 Thread Shaun Jackman

On 26 Jun 2006 14:42:20 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

No, that wouldn't work.  MULTILIB_EXCEPTIONS takes a shell glob
pattern.  It is invoked for each option set which is going to be
generated.  I would expect that one of the option sets would be simply
"-msingle-pic-base".  So it seems to me that MULTILIB_EXCEPTIONS can
be made to work, if you play with the syntax.  However, if it can not
be made to work for some reason, then I don't know of anything else
which will do what you want.  Sorry.


Reading through gcc/genmultilib, it looks as though
MULTILIB_EXCLUSIONS can take a '!' parameter, but MULTILIB_EXCEPTIONS
cannot.

Here's another failed experiment:

MULTILIB_OPTIONS+= fno-pic/fPIC
MULTILIB_DIRNAMES   += nopic pic
#
MULTILIB_OPTIONS+= mno-single-pic-base/msingle-pic-base
MULTILIB_DIRNAMES   += noxip xip
MULTILIB_EXCEPTIONS += *fno-pic*/*msingle-pic-base*

I wish I knew what text the shell glob pattern was globbing against.
It's clearly not using the resulting directory structure, since the
EXCEPTIONS use the OPTIONS names and not the DIRNAMES.

Cheers,
Shaun


Re: Specifying a MULTILIB dependency

2006-06-26 Thread Shaun Jackman

On 6/26/06, Shaun Jackman <[EMAIL PROTECTED]> wrote:

Reading through gcc/genmultilib, it looks as though
MULTILIB_EXCLUSIONS can take a '!' parameter, but MULTILIB_EXCEPTIONS
cannot.


The solutions was to use MULTILIB_EXCLUSIONS!

MULTILIB_EXCLUSIONS += !fPIC/msingle-pic-base

Yeeha!

Cheers,
Shaun


Re: Testing MULTILIB configurations

2006-06-26 Thread Shaun Jackman

On 6/26/06, Shaun Jackman <[EMAIL PROTECTED]> wrote:

After I've modified the MULTILIB options in t-arm-elf, what's the
fastest way to test the new configuration without rebuilding the
entire toolchain? Right now the best method I have is `make clean-gcc
all-gcc', which is admittedly quite slow.

Please cc me in your reply. Thanks!
Shaun


The best method I found was to remove gcc/s-mlib and
gcc/stmp-multilib, rebuild xgcc (make -C gcc xgcc), and run xgcc
-print-multi-lib to find the new multilib configuration.

Cheers,
Shaun


XIP on an ARM processor (R_ARM_GOTOFF32)

2006-06-27 Thread Shaun Jackman

I'm attempting to build an XIP "Hello, world!" application for the ARM
processor. I'm compiling with -fPIC -msingle-pic-base with the default
-mpic-register=r10. The layout of the memory map is such that the
.text and .rodata are in flash memory, and the .got, .data and so
forth are loaded into RAM. r10 is loaded with the address of the .got.

I'm having trouble with string constants. The address of a string
constant is being resolved using a R_ARM_GOTOFF32 relocation. For
example...

 d6:4803ldr r0, [pc, #12]   (e4 <.text+0xe4>)
 d8:4450add r0, sl
...
 e4:ffdc    undefined
e4: R_ARM_GOTOFF32  .LC0

The address to the string constant is being specified as a negative
offset from the GOT. However, in this memory layout the GOT does not
immediately follow the .text section, so a negative offset from the
GOT expecting to find the .text section does not make any sense. A PC
relative relocation (R_ARM_PC24), on the other hand, should work. The
address of the string constant would be specified as a positive offset
from the program counter, rather than a negative offset from the GOT.
Is there a compiler option to force this behaviour?

Please cc me in your reply. Thanks,
Shaun


Re: [uClinux-dev] XIP on an ARM processor (R_ARM_GOTOFF32)

2006-06-27 Thread Shaun Jackman

On 6/27/06, Erwin Authried <[EMAIL PROTECTED]> wrote:

Hi,

which compiler/binutil version did you use? Could you post the source
that you used?
One other thing (although that doesn't seem to have to do with your
problem): It is important that you use -fpic (not -fPIC) so that the
correct multilib library is used. For the same reason, -fpic must be
specified in CFLAGS as well as in LDFLAGS if you use a Makefile.

Regards,
Erwin


Hello Erwin,

The source is simply...

int main() { puts("Hello, world!"); return 0; }

I'm using binutils 2.17 and gcc 4.1.1.

The issue-raising difference is that I attempted to place .rodata
after the .text semgent in flash, rather than with the .data segment
in RAM. This should work, except that GCC is creating GOT-relative
references to the symbols in .rodata rather than PC-relative
references. The GOT-relative reference does not work if the .rodata
symbol is with the .text segment instead of with the .data segment.

Cheers,
Shaun


Re: XIP on an ARM processor (R_ARM_GOTOFF32)

2006-06-27 Thread Shaun Jackman

On 6/27/06, Shaun Jackman <[EMAIL PROTECTED]> wrote:

I'm attempting to build an XIP "Hello, world!" application for the ARM
processor. I'm compiling with -fPIC -msingle-pic-base with the default
-mpic-register=r10. The layout of the memory map is such that the
.text and .rodata are in flash memory, and the .got, .data and so
forth are loaded into RAM. r10 is loaded with the address of the .got.


A workaround is to place the .rodata section after the .got with the
.data section rather than with the .text data. This work-around loses
the advantage of keeping the read-only data in flash, instead of
copying it to RAM.

Cheers,
Shaun


Re: [uClinux-dev] Re: XIP on an ARM processor (R_ARM_GOTOFF32)

2006-06-27 Thread Shaun Jackman

On 6/27/06, David McCullough <[EMAIL PROTECTED]> wrote:

Are you using the ld-elf2flt/elf2flt.ld combo ?

It lays things out in a known way and has a '-move-rodata' option which
will put the rodata in with the .text if it contains no relocation info
needed at runtime.

Something like this on the link line:

-gcc  -Wl,-move-rodata 

The uClinux-dist has this as the default for !MMU systems that can do
XIP,

Cheers,
Davidm


I'm not using ld-elf2flt, but I am using elf2flt.ld.

arm-elf-gcc -mthumb -fPIC -msingle-pic-base -Wl,-q -Telf2flt.ld hello.c -o hello
arm-elf-elf2flt -adp hello hello -o hello.bflt

The ld script that puts .rodata in .text fails, due to GCC referencing
symbols in .rodata with a GOTOFF32 relocation. So, I used the other
linker script instead, which put .rodata in .data after the .got. This
got me as far as an XIP "Hello, world!" working, which is thrilling!

Unfortunately, a more complex application, busybox, which uses bsearch
and thus a callback function, fails because GCC loads the address for
the callback function using a GOTOFF32 relocation. In the XIP case
where .text and .data are separated and .got does not immediately
follow .text, a GOTOFF32 relocation to a symbol in the .text segment
is super broken.

   bc84:4b05ldr r3, [pc, #20]   (bc9c <.text+0xbc9c>)
   bc86:b081sub sp, #4
   bc88:4453add r3, sl
   bc8a:9300str r3, [sp, #0]
...
   bc9c:2461movsr4, #97
bc9c: R_ARM_GOTOFF32applet_name_compare

Loading the address of the callback function with either a PC relative
relocation or a GOT (not GOTOFF) relocation would work. Can I prevent
GCC from using GOTOFF32 relocations to symbols located in the .text
segment?

Can someone on the list verify with their toolchain that the address
of a static function is loaded using a GOTOFF32 relocation? The
following test should suffice. It's interesting to note that if the
callback function is not static, GCC will use a GOT32 relocation
instead, which will work.

I'm using binutils 2.17 and gcc 4.1.1.

Cheers,
Shaun

$ cat f.c
void g(void (*h)(void)) {}
static void f(void) { g(f); }
$ arm-elf-gcc -mthumb -fPIC -msingle-pic-base -c f.c
$ arm-elf-objdump -rS f.o | tail -3
 24:   lslsr0, r0, #0
   24: R_ARM_GOTOFF32  f
   ...


Re: [uClinux-dev] Re: XIP on an ARM processor (R_ARM_GOTOFF32)

2006-06-28 Thread Shaun Jackman

On 6/27/06, David McCullough <[EMAIL PROTECTED]> wrote:

AFAIK,  you need to drop the -FPIC in favour of -fpic everywhere.



From the GCC manual, -fpic vs. -fPIC `makes a difference on the m68k,

PowerPC and SPARC.' For my purposes, it makes no difference on the
ARM.


You could try some experiments using the older toolchain to see how it
puts binaries together etc.  AFAIK,  thumb and most archs are supported
by it, it provides at least a working reference for fixing a new
toolchain at least,


I have experimented with GCC 4.0.3, 4.1.0, and 4.1.1. I found that
4.1.x behave the same; however, GCC 4.0.3 does not emit GOTOFF32
relocations. Apparently these are a new feature and preferable in some
instances since they do reduce the number of instructions by one.
However, GOTOFF32 does not work when a the relocation is emitted for a
symbol in the .text segment. As far as I know, any XIP application
with a static callback function should fail if it's compiled with
arm-elf-gcc 4.1.x. If anyone else is using this toolchain, I'd
appreciate the confirmation. The following diff of the output from GCC
4.0.3 to GCC 4.1.1 illustrates the new behaviour.

Cheers,
Shaun

$ cat f.c
void g(void (*h)(void)) {}
static void f(void) { g(f); }
$ diff -u f.s-4.0.3 f.s-4.1.1
--- f.s-4.0.3   2006-06-28 09:32:54.044964568 -0600
+++ f.s-4.1.1   2006-06-28 08:55:49.880089024 -0600
@@ -8,11 +8,12 @@
   .type   g, %function
g:
   push{r7, lr}
-   mov r7, sp
   sub sp, sp, #4
-   sub r3, r7, #4
+   add r7, sp, #0
+   mov r3, r7
   str r0, [r3]
   mov sp, r7
+   add sp, sp, #4
   @ sp needed for prologue
   pop {r7, pc}
   .size   g, .-g
@@ -22,10 +23,9 @@
   .type   f, %function
f:
   push{r7, lr}
-   mov r7, sp
+   add r7, sp, #0
   ldr r3, .L5
   add r3, r3, sl
-   ldr r3, [r3]
   mov r0, r3
   bl  g
   mov sp, r7
@@ -34,6 +34,6 @@
.L6:
   .align  2
.L5:
-   .word   f(GOT)
+   .word   f(GOTOFF)
   .size   f, .-f
-   .ident  "GCC: (GNU) 4.0.3"
+   .ident  "GCC: (GNU) 4.1.1"


Re: [uClinux-dev] Re: XIP on an ARM processor (R_ARM_GOTOFF32)

2006-06-28 Thread Shaun Jackman

On 6/28/06, Shaun Jackman <[EMAIL PROTECTED]> wrote:

I have experimented with GCC 4.0.3, 4.1.0, and 4.1.1. I found that
4.1.x behave the same; however, GCC 4.0.3 does not emit GOTOFF32
relocations. Apparently these are a new feature and preferable in some
instances since they do reduce the number of instructions by one.
However, GOTOFF32 does not work when a the relocation is emitted for a
symbol in the .text segment. As far as I know, any XIP application
with a static callback function should fail if it's compiled with
arm-elf-gcc 4.1.x. If anyone else is using this toolchain, I'd
appreciate the confirmation. The following diff of the output from GCC
4.0.3 to GCC 4.1.1 illustrates the new behaviour.

Cheers,
Shaun


I've opened a new bug, GCC #28194 [1], for this matter.

Cheers,
Shaun

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28194


Which patch added R_ARM_GOTOFF32 support?

2006-06-28 Thread Shaun Jackman

Hello Richard, Dan,

I'm trying to track down which part of the GCC source tree makes the
decision to emit either a R_ARM_GOT32 or a R_ARM_GOTOFF32 relocation.
A new feature in GCC 4.1 emits a R_ARM_GOTOFF32 relocation for a
reference to a static function. I thought there was a good chance one
of you two might have added this feature. If you're familiar with the
the patch, would you please point me to the relevant ChangeLog entry?
I'm hoping to add an exception for execute in place (XIP) code (-fPIC
-msingle-pic-base) to use R_ARM_GOT32 instead of R_ARM_GOTOFF32.

Many thanks,
Shaun

See also http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28194


Re: Which patch added R_ARM_GOTOFF32 support?

2006-06-28 Thread Shaun Jackman

On 6/28/06, Daniel Jacobowitz <[EMAIL PROTECTED]> wrote:

GOTOFF support has been there for a long while.  Only use of it for
static functions is recent.  It should be easy to find.  But this is
not at all the only problem.  GCC's PIC model assumes a fixed
displacement between segments.


Even if a fixed displacement is a basic assumption, it doesn't seem to
affect the ARM PIC/XIP code drastically, except for this specific case
of a static callback function. Everything else seems to `just work'. A
simple "Hello, world!" application linked statically against newlib
and using stdio (which is fairly non-trivial) works, for example.

I'm not terribly familiar with the GCC source tree. I scanned
config/arm/arm.c and its SVN log for changes that might affect
GOTOFF32, but came up empty. Do you know where the decision of GOT or
GOTOFF would be handled?


We've implemented something similar to what you need for VxWorks.  A
couple of other places had to be changed.  I don't remember if the
VxWorks gcc port was submitted, or just the binutils bits.


Any chance of this work making it into GCC?

Cheers,
Shaun


Re: Which patch added R_ARM_GOTOFF32 support?

2006-06-28 Thread Shaun Jackman

On 6/28/06, Daniel Jacobowitz <[EMAIL PROTECTED]> wrote:

On Wed, Jun 28, 2006 at 03:17:30PM -0600, Shaun Jackman wrote:
> I'm not terribly familiar with the GCC source tree. I scanned
> config/arm/arm.c and its SVN log for changes that might affect
> GOTOFF32, but came up empty. Do you know where the decision of GOT or
> GOTOFF would be handled?

Sorry, it was written quite a while ago.  I don't know.


Do you know who added this feature? What is you, or perhaps Nick
Clifton or Richard Earnshaw? If I could find the person and the file
that added this feature, I could probably track down the patch in svn
and modify it for XIP.

Cheers,
Shaun


Re: Which patch added R_ARM_GOTOFF32 support?

2006-06-29 Thread Shaun Jackman

On 6/28/06, Daniel Jacobowitz <[EMAIL PROTECTED]> wrote:

It was probably me.  But why can't you do this yourself?  Look at the
assembly.  See what the output string is.  Search for it in
config/arm/.  Use svn blame, as already suggested.


I did search the assembler text and found the constant and relocation
(GOTOFF) is emitted by arm_assemble_integer. Unfortunately, by this
point the code to use that constant has already been emitted, and so
the decision of which relocation to use has already been made.

All I mean to say in this text, is that if a clueful person has the
knowledge in their head as to which file/function is responsible, it
might take that person 30 seconds and a one-line email to pass that
knowledge on to me, and save me a full day of guessing games.

Cheers,
Shaun


Re: Which patch added R_ARM_GOTOFF32 support?

2006-06-29 Thread Shaun Jackman

On 6/29/06, Richard Earnshaw <[EMAIL PROTECTED]> wrote:

No, it was PhilB, but it must have been two or three years ago now.


Thanks, Richard. I suspect svn r71881 is responsible. I'll start
testing and hopefully put a patch together. I would suspect r49871,
but this patch is in 4.0.3, which doesn't emit GOTOFF.

Cheers,
Shaun

r49871
2002-02-19  Philip Blundell  <[EMAIL PROTECTED]>

* config/arm/arm.c (arm_encode_call_attribute): Operate on any
decl, not just FUNCTION_DECL.
(legitimize_pic_address): Handle local SYMBOL_REF like LABEL_REF.
(arm_assemble_integer): Likewise.
* config/arm/arm.h (ARM_ENCODE_CALL_TYPE): Allow any decl to be
marked local.

r71881
2003-09-28  Philip Blundell  <[EMAIL PROTECTED]>

* config/arm/arm.c (legitimize_pic_address): Check
SYMBOL_REF_LOCAL_P, not ENCODED_SHORT_CALL_ATTR_P.
(arm_assemble_integer): Likewise.


[PATCH] config/arm/arm.c: Use GOT instead of GOTOFF when XIP

2006-06-29 Thread Shaun Jackman

This patch forces the use of GOT relocations instead of GOTOFF when
compiling execute-in-place (XIP) code. I've defined XIP as the
combination of -fpic and -msingle-pic-base.

There is room for improvement in using GOTOFF relocations for symbols
that will be in the same addressing space as the GOT (.data likely)
and PC relative relocations for symbols that will be in the same
addressing space as the PC (.text and .rodata likely).

I'm open to suggestions for the name of the new predicate, which I've
named local_symbol_p here. I had named it use_gotoff_p at one point.

Cheers,
Shaun

2006-06-29  Shaun Jackman  <[EMAIL PROTECTED]>

* config/arm/arm.c (local_symbol_p): New function.
(legitimize_pic_address, arm_assemble_integer): Use it to prevent
GOTOFF relocations to the .text segment in execute-in-place code.

Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 115074)
+++ config/arm/arm.c(working copy)
@@ -3193,6 +3193,17 @@
  return 1;
}

+static int
+local_symbol_p(rtx x)
+{
+  /* Execute-in-place code fails if a GOTOFF relocation is used to
+   * adress a local symbol in the .text segment. */
+  if (flag_pic && TARGET_SINGLE_PIC_BASE)
+return 0;
+  return GET_CODE (x) == LABEL_REF
+|| (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_LOCAL_P (x));
+}
+
 rtx
 legitimize_pic_address (rtx orig, enum machine_mode mode, rtx reg)
{
@@ -3271,10 +3282,7 @@
  else
emit_insn (gen_pic_load_addr_thumb (address, orig));

-  if ((GET_CODE (orig) == LABEL_REF
-  || (GET_CODE (orig) == SYMBOL_REF &&
-  SYMBOL_REF_LOCAL_P (orig)))
- && NEED_GOT_RELOC)
+  if (NEED_GOT_RELOC && local_symbol_p (orig))
pic_ref = gen_rtx_PLUS (Pmode, cfun->machine->pic_reg, address);
  else
{
@@ -11292,12 +11300,10 @@
  if (NEED_GOT_RELOC && flag_pic && making_const_table &&
  (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF))
{
- if (GET_CODE (x) == SYMBOL_REF
- && (CONSTANT_POOL_ADDRESS_P (x)
- || SYMBOL_REF_LOCAL_P (x)))
+ if ((GET_CODE (x) == SYMBOL_REF
+   && CONSTANT_POOL_ADDRESS_P (x))
+ || local_symbol_p (x))
fputs ("(GOTOFF)", asm_out_file);
- else if (GET_CODE (x) == LABEL_REF)
-   fputs ("(GOTOFF)", asm_out_file);
  else
fputs ("(GOT)", asm_out_file);
}
2006-06-29  Shaun Jackman  <[EMAIL PROTECTED]>

* config/arm/arm.c (local_symbol_p): New function.
(legitimize_pic_address, arm_assemble_integer): Use it to prevent
GOTOFF relocations to the .text segment in execute-in-place code.

Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 115074)
+++ config/arm/arm.c(working copy)
@@ -3193,6 +3193,17 @@
   return 1;
 }
 
+static int
+local_symbol_p(rtx x)
+{
+  /* Execute-in-place code fails if a GOTOFF relocation is used to
+   * adress a local symbol in the .text segment. */
+  if (flag_pic && TARGET_SINGLE_PIC_BASE)
+return 0;
+  return GET_CODE (x) == LABEL_REF
+|| (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_LOCAL_P (x));
+}
+
 rtx
 legitimize_pic_address (rtx orig, enum machine_mode mode, rtx reg)
 {
@@ -3271,10 +3282,7 @@
   else
emit_insn (gen_pic_load_addr_thumb (address, orig));
 
-  if ((GET_CODE (orig) == LABEL_REF
-  || (GET_CODE (orig) == SYMBOL_REF &&
-  SYMBOL_REF_LOCAL_P (orig)))
- && NEED_GOT_RELOC)
+  if (NEED_GOT_RELOC && local_symbol_p (orig))
pic_ref = gen_rtx_PLUS (Pmode, cfun->machine->pic_reg, address);
   else
{
@@ -11292,12 +11300,10 @@
   if (NEED_GOT_RELOC && flag_pic && making_const_table &&
  (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF))
{
- if (GET_CODE (x) == SYMBOL_REF
- && (CONSTANT_POOL_ADDRESS_P (x)
- || SYMBOL_REF_LOCAL_P (x)))
+ if ((GET_CODE (x) == SYMBOL_REF
+   && CONSTANT_POOL_ADDRESS_P (x))
+ || local_symbol_p (x))
fputs ("(GOTOFF)", asm_out_file);
- else if (GET_CODE (x) == LABEL_REF)
-   fputs ("(GOTOFF)", asm_out_file);
  else
fputs ("(GOT)", asm_out_file);
}


Re: [PATCH] config/arm/arm.c: Use GOT instead of GOTOFF when XIP

2006-07-07 Thread Shaun Jackman

Any comments on this patch? This patch, or something like it, is
absolutely necessary to support execute-in-place (XIP) on uClinux.

Please cc me in your reply. Cheers,
Shaun

On 6/29/06, Shaun Jackman <[EMAIL PROTECTED]> wrote:

This patch forces the use of GOT relocations instead of GOTOFF when
compiling execute-in-place (XIP) code. I've defined XIP as the
combination of -fpic and -msingle-pic-base.

There is room for improvement in using GOTOFF relocations for symbols
that will be in the same addressing space as the GOT (.data likely)
and PC relative relocations for symbols that will be in the same
addressing space as the PC (.text and .rodata likely).

I'm open to suggestions for the name of the new predicate, which I've
named local_symbol_p here. I had named it use_gotoff_p at one point.

Cheers,
Shaun

2006-06-29  Shaun Jackman  <[EMAIL PROTECTED]>

* config/arm/arm.c (local_symbol_p): New function.
(legitimize_pic_address, arm_assemble_integer): Use it to prevent
GOTOFF relocations to the .text segment in execute-in-place code.

Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 115074)
+++ config/arm/arm.c(working copy)
@@ -3193,6 +3193,17 @@
   return 1;
 }

+static int
+local_symbol_p(rtx x)
+{
+  /* Execute-in-place code fails if a GOTOFF relocation is used to
+   * adress a local symbol in the .text segment. */
+  if (flag_pic && TARGET_SINGLE_PIC_BASE)
+return 0;
+  return GET_CODE (x) == LABEL_REF
+|| (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_LOCAL_P (x));
+}
+
  rtx
  legitimize_pic_address (rtx orig, enum machine_mode mode, rtx reg)
 {
@@ -3271,10 +3282,7 @@
   else
emit_insn (gen_pic_load_addr_thumb (address, orig));

-  if ((GET_CODE (orig) == LABEL_REF
-  || (GET_CODE (orig) == SYMBOL_REF &&
-  SYMBOL_REF_LOCAL_P (orig)))
- && NEED_GOT_RELOC)
+  if (NEED_GOT_RELOC && local_symbol_p (orig))
pic_ref = gen_rtx_PLUS (Pmode, cfun->machine->pic_reg, address);
   else
{
@@ -11292,12 +11300,10 @@
   if (NEED_GOT_RELOC && flag_pic && making_const_table &&
  (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF))
{
- if (GET_CODE (x) == SYMBOL_REF
- && (CONSTANT_POOL_ADDRESS_P (x)
- || SYMBOL_REF_LOCAL_P (x)))
+ if ((GET_CODE (x) == SYMBOL_REF
+   && CONSTANT_POOL_ADDRESS_P (x))
+ || local_symbol_p (x))
fputs ("(GOTOFF)", asm_out_file);
- else if (GET_CODE (x) == LABEL_REF)
-   fputs ("(GOTOFF)", asm_out_file);
  else
fputs ("(GOT)", asm_out_file);
}


mthumb in specs file

2005-02-24 Thread Shaun Jackman
Is it possible by hacking the specs file to change the target for
arm-elf-gcc from -marm to -mthumb? I tried a few obvious things like
changing marm in *multilib_defaults to mthumb, but this did not have
the desired effect.

Please cc me in your reply. Thanks!
Shaun


-Ttext with -mthumb causes relocation truncated to fit

2005-02-24 Thread Shaun Jackman
When -Ttext is used in combination with -mthumb it causes a relocation
truncated to fit message. What does this mean, and how do I fix it?

Please cc me in your reply. Thanks,
Shaun

$ arm-elf-gcc --version | head -1
arm-elf-gcc (GCC) 3.4.0
$ cat hello.c
int main() { return 0; }
$ arm-elf-gcc -Ttext 0x200 -mthumb hello.c
/opt/pathport/lib/gcc/arm-elf/3.4.0/thumb/crtbegin.o(.init+0x0): In
function `$t':
: relocation truncated to fit: R_ARM_THM_PC22 frame_dummy
/opt/pathport/lib/gcc/arm-elf/3.4.0/../../../../arm-elf/lib/thumb/crt0.o(.text+0x9a):../../../../../../../gcc-3.4.0/newlib/libc/sys/arm/crt0.S:200:
relocation truncated to fit: R_ARM_THM_PC22 _init
/opt/pathport/lib/gcc/arm-elf/3.4.0/thumb/crtend.o(.init+0x0): In function `$t':
: relocation truncated to fit: R_ARM_THM_PC22 __do_global_ctors_aux
collect2: ld returned 1 exit status


Specifying a linker script from the specs file

2005-02-24 Thread Shaun Jackman
I have had no trouble specifiying the linker script using the -T
switch to gcc. I am now trying to specify the linker script from a
specs file like so:

%rename link old_link
*link:
-Thello.ld%s %(old_link)

gcc complains though about linking Thumb code against ARM libraries --
I've specified -mthumb to gcc --
/opt/pathport/lib/gcc/arm-elf/3.4.0/../../../../arm-elf/bin/ld:
/opt/pathport/arm-elf/lib/libc.a(memset.o)(memset): warning:
interworking not enabled.

Why does the above specs snippet cause gcc to forget it's linking
against thumb libraries?

Please cc me in your reply. Thanks,
Shaun

$ arm-elf-gcc --version | head -1
arm-elf-gcc (GCC) 3.4.0
$ cat hello.c
int main() { return 0; }
$ cat hello.specs
%rename link old_link
*link:
-Thello.ld%s %(old_link)
$ diff /opt/pathport/arm-elf/lib/ldscripts/armelf.xc hello.ld
12c12
<   PROVIDE (__executable_start = 0x8000); . = 0x8000;
---
>   PROVIDE (__executable_start = 0x200); . = 0x200;
181c181
< .stack 0x8 :
---
> .stack 0x2100 :
$ arm-elf-gcc -mthumb -Thello.ld hello.c
$ arm-elf-gcc -mthumb -specs=hello.specs hello.c 2>&1 | head -1
/opt/pathport/lib/gcc/arm-elf/3.4.0/../../../../arm-elf/bin/ld:
/opt/pathport/arm-elf/lib/libc.a(memset.o)(memset): warning:
interworking not enabled.


Re: -Ttext with -mthumb causes relocation truncated to fit

2005-02-25 Thread Shaun Jackman
> I did fix at least one bug, such that -Ttext does something useful with
> ELF toolchains, if your linker script it set up to use it.  I think the
> ARM BPABI script may be the only one set up that way, though.

That would be excellent. I dislike having to read / modify / write
(read: fork) an existing linker script only to change the org and
stack location. Speaking of which, the addition of a -Tstack flag
would eliminate any need I have for a linker script.

Where can I find the ARM Base Platform ABI linker script you mention?

Cheers,
Shaun


Re: [avr-libc-dev] [bug #21623] boot.h: Use the "z" register constraint

2007-11-21 Thread Shaun Jackman
(cc'ing gcc@gcc.gnu.org)

On Nov 21, 2007 2:38 AM, Wouter van Gulik <[EMAIL PROTECTED]> wrote:
> Also consider the fuse bit get routine. This scheme gives more knowledge
> to the compiler, unfortunately gcc fails to see the loading of r31 can
> done once:
>
> using this:
>
> =
> static inline uint8_t boot_lock_fuse_bits_new(uint16_t address)
> {
>  uint8_t result;
>  register uint16_t adr asm("r30") = address; //make sure it's in z
> register aka r30:r31
>
>  asm volatile(
>  "sts %1, %2\n\t"
>  "lpm %0, Z"
>  : "=r" (result)
>  : "i" (_SFR_MEM_ADDR(__SPM_REG)),
>"r" ((uint8_t)__BOOT_LOCK_BITS_SET),
>"z" (adr)
>  : "r0"
>  );
>  return result;
> }
>
> uint8_t bar(void)
> {
> uint8_t temp;
> uint16_t adr = 0;
> temp  = boot_lock_fuse_bits_new(adr++);
> temp += boot_lock_fuse_bits_new(adr++);
> temp += boot_lock_fuse_bits_new(adr++);
> temp += boot_lock_fuse_bits_new(adr++);
> return temp;
> }
>
> =
>
> It gives this assembler output:
> .global bar
> .type   bar, @function
> bar:
> /* prologue: frame size=0 */
> /* prologue end (size=0) */
> ldi r30,lo8(0)   ;  8   *movhi/4[length = 2]
> ldi r31,hi8(0)
> ldi r25,lo8(9)   ;  10  *movqi/2[length = 1]
> /* #APP */
> sts 87, r25
> lpm r24, Z
> /* #NOAPP */
> ldi r30,lo8(1)   ;  16  *movhi/4[length = 2]
> ldi r31,hi8(1)
> /* #APP */
> sts 87, r25
> lpm r30, Z
> /* #NOAPP */
> add r24,r30  ;  22  addqi3/1[length = 1]
> ldi r30,lo8(2)   ;  24  *movhi/4[length = 2]
> ldi r31,hi8(2)
> /* #APP */
> sts 87, r25
> lpm r18, Z
> /* #NOAPP */
> ldi r30,lo8(3)   ;  29  *movhi/4[length = 2]
> ldi r31,hi8(3)
> /* #APP */
> sts 87, r25
> lpm r25, Z
> /* #NOAPP */
> add r25,r18  ;  36  addqi3/1[length = 1]
> add r24,r25  ;  37  addqi3/1[length = 1]
> clr r25  ;  45  zero_extendqihi2/1  [length = 1]
> /* epilogue: frame size=0 */
> ret
> /* epilogue end (size=1) */
> /* function bar size 30 (29) */
> .size   bar, .-bar
>
>
> This is not smaller nor faster but it could have been. If gcc would
> leave r31, or do a adiw
> I tried against 4.1.2 using -Wall -Os -mmcu=atmega16. Maybe 4.2.2 or
> 4.3.0 is better?
>
> It does however use r30 as output which could save some speed and code
> when no other register is available.
>
> HTH,
>
> Wouter

I have also noticed that a series of
p = buf; *p++; *p++ *p++;
get's optimized to
buf[0]; buf[1]; buf[2];
which may be faster on some architectures, but loading constants is
quite expensive on the AVR. I don't know a terrible lot about GCC
optimisations, but I suspect it would be related to the constant pool
management, to realise that we already have a 2 in the constant pool,
and we can best introduce a 3 to the constant pool by incrementing 2.

Cheers,
Shaun


Disabling the heuristic inliner

2007-12-19 Thread Shaun Jackman
Is it possible to disable the heuristic inline function logic? I would
like to select the following behaviour:

* all static inline functions are always inlined
* all static functions that are called once are inlined
(-finline-functions-called-once)
* no other functions are inlined

I'm using -Os and I'm trying to find the right combination of -f
switches to enable this behaviour. I tried
-finline-limit=1 -finline-functions-called-once
but it caused static inline functions to not be inlined.

Thanks,
Shaun



Re: Disabling the heuristic inliner

2007-12-19 Thread Shaun Jackman
Is there a switch to never inline a non-static function?

Thanks,
Shaun

On Dec 19, 2007 6:25 PM, Shaun Jackman <[EMAIL PROTECTED]> wrote:
> Is it possible to disable the heuristic inline function logic? I would
> like to select the following behaviour:
>
> * all static inline functions are always inlined
> * all static functions that are called once are inlined
> (-finline-functions-called-once)
> * no other functions are inlined
>
> I'm using -Os and I'm trying to find the right combination of -f
> switches to enable this behaviour. I tried
> -finline-limit=1 -finline-functions-called-once
> but it caused static inline functions to not be inlined.
>
> Thanks,
> Shaun