from:"Jamie Prescott"

Avoid subreg access patterns

2009-05-09 Thread Jamie Prescott


Sorry for the newbie question, but I don't seem to get around it.
The backend I'm writing has 32 bit registers, and 64 bit registers (used for 
SF, DF, DI).
There is a direct store instruction from a 64 bit register, to memory, and this 
instruction is correctly
seen and used by GCC in other occasions.
Nevertheless GCC generate a split 32 bit store, with subreg access.
Is there a way to prevent GCC to perform split/subreg accesses?

struct mooma {
double a, b;
};


extern int vva(int k, ...);

int cvva(int u)
{
struct mooma mm;

mm.a = u - 1;
mm.b = 0;

return vva(u - 2, mm);
}


;; Function cvva (cvva)

Partition 0: size 16 align 4
mm, offset 0

;; Generating RTL for tree basic block 2

;; mm.a = (double) (u + -1)
(insn 6 5 7 .c:44 (set (reg:SI 126)
(const_int -1 [0x])) -1 (nil))

(insn 7 6 8 .c:44 (set (reg:SI 125)
(plus:SI (reg/v:SI 124 [ u ])
(reg:SI 126))) -1 (nil))

(insn 8 7 9 .c:44 (set (reg:DF 127)
(float:DF (reg:SI 125))) -1 (nil))

(insn 9 8 10 .c:44 (set (mem/s/c:SI (plus:SI (reg/f:SI 118 
virtual-stack-vars)
(const_int -16 [0xfff0])) [0+0 S4 A32])
(subreg:SI (reg:DF 127) 0)) -1 (nil))

(insn 10 9 0 .c:44 (set (mem/s/c:SI (plus:SI (reg/f:SI 118 
virtual-stack-vars)
(const_int -12 [0xfff4])) [0+4 S4 A32])
(subreg:SI (reg:DF 127) 4)) -1 (nil))

;; mm.b = 0.0
(insn 11 10 12 .c:45 (set (mem/s/c:SI (plus:SI (reg/f:SI 118 
virtual-stack-vars)
(const_int -8 [0xfff8])) [0+0 S4 A32])
(const_int 0 [0x0])) -1 (nil))

(insn 12 11 0 .c:45 (set (mem/s/c:SI (plus:SI (reg/f:SI 118 
virtual-stack-vars)
(const_int -4 [0xfffc])) [0+4 S4 A32])
(const_int 0 [0x0])) -1 (nil))



The movdf instruction is defined as (the 'x' regclass maps the 64 bit 
registers):

(define_insn "movdf"
  [(set (match_operand:DF 0 "nonimmediate_operand" "=x,x,m")
(match_operand:DF 1 "general_operand" "m,x,x"))]
  ""
  "@
  ldr.f\t%1,%0
  fdmov\t%1,%0
  str.f\t%1,%0"
  [
   (set_attr "length" "3,6,3")
   (set_attr "cc" "none,none,none")
  ]
)


Since this instruction set does not have a direct move of 64 bit constant to 
register, I defined
this:

enum reg_class
js1_preferred_reload_class(rtx x, enum reg_class regclass)
{
enum machine_mode mode = GET_MODE(x);

if (regclass == NO_REGS ||
((GET_CODE(x) == CONST_INT ||
  GET_CODE(x) == CONST_DOUBLE) && regclass == X_REGS))
return NO_REGS;

return GET_MODE_SIZE(mode) > 4 ? NO_REGS: regclass;
}


Is this the problem?
Thank you,

- Jamie

Re: Avoid subreg access patterns

2009-05-09 Thread Jamie Prescott


Never mind, setting STRICT_ALIGNMENT to zero did the trick.


 - Jamie



- Original Message 
> From: Jamie Prescott 
> To: gcc@gcc.gnu.org
> Sent: Saturday, May 9, 2009 1:20:28 PM
> Subject: Avoid subreg access patterns
> 
> 
> Sorry for the newbie question, but I don't seem to get around it.
> The backend I'm writing has 32 bit registers, and 64 bit registers (used for 
> SF, 
> DF, DI).
> There is a direct store instruction from a 64 bit register, to memory, and 
> this 
> instruction is correctly
> seen and used by GCC in other occasions.
> Nevertheless GCC generate a split 32 bit store, with subreg access.
> Is there a way to prevent GCC to perform split/subreg accesses?
> 
> struct mooma {
> double a, b;
> };
> 
> 
> extern int vva(int k, ...);
> 
> int cvva(int u)
> {
> struct mooma mm;
> 
> mm.a = u - 1;
> mm.b = 0;
> 
> return vva(u - 2, mm);
> }
> 
> 
> ;; Function cvva (cvva)
> 
> Partition 0: size 16 align 4
> mm, offset 0
> 
> ;; Generating RTL for tree basic block 2
> 
> ;; mm.a = (double) (u + -1)
> (insn 6 5 7 .c:44 (set (reg:SI 126)
> (const_int -1 [0x])) -1 (nil))
> 
> (insn 7 6 8 .c:44 (set (reg:SI 125)
> (plus:SI (reg/v:SI 124 [ u ])
> (reg:SI 126))) -1 (nil))
> 
> (insn 8 7 9 .c:44 (set (reg:DF 127)
> (float:DF (reg:SI 125))) -1 (nil))
> 
> (insn 9 8 10 .c:44 (set (mem/s/c:SI (plus:SI (reg/f:SI 118 
> virtual-stack-vars)
> (const_int -16 [0xfff0])) [0+0 S4 A32])
> (subreg:SI (reg:DF 127) 0)) -1 (nil))
> 
> (insn 10 9 0 .c:44 (set (mem/s/c:SI (plus:SI (reg/f:SI 118 
> virtual-stack-vars)
> (const_int -12 [0xfff4])) [0+4 S4 A32])
> (subreg:SI (reg:DF 127) 4)) -1 (nil))
> 
> ;; mm.b = 0.0
> (insn 11 10 12 .c:45 (set (mem/s/c:SI (plus:SI (reg/f:SI 118 
> virtual-stack-vars)
> (const_int -8 [0xfff8])) [0+0 S4 A32])
> (const_int 0 [0x0])) -1 (nil))
> 
> (insn 12 11 0 .c:45 (set (mem/s/c:SI (plus:SI (reg/f:SI 118 
> virtual-stack-vars)
> (const_int -4 [0xfffc])) [0+4 S4 A32])
> (const_int 0 [0x0])) -1 (nil))
> 
> 
> 
> The movdf instruction is defined as (the 'x' regclass maps the 64 bit 
> registers):
> 
> (define_insn "movdf"
>   [(set (match_operand:DF 0 "nonimmediate_operand" "=x,x,m")
> (match_operand:DF 1 "general_operand" "m,x,x"))]
>   ""
>   "@
>   ldr.f\t%1,%0
>   fdmov\t%1,%0
>   str.f\t%1,%0"
>   [
>(set_attr "length" "3,6,3")
>(set_attr "cc" "none,none,none")
>   ]
> )
> 
> 
> Since this instruction set does not have a direct move of 64 bit constant to 
> register, I defined
> this:
> 
> enum reg_class
> js1_preferred_reload_class(rtx x, enum reg_class regclass)
> {
> enum machine_mode mode = GET_MODE(x);
> 
> if (regclass == NO_REGS ||
> ((GET_CODE(x) == CONST_INT ||
>   GET_CODE(x) == CONST_DOUBLE) && regclass == X_REGS))
> return NO_REGS;
> 
> return GET_MODE_SIZE(mode) > 4 ? NO_REGS: regclass;
> }
> 
> 
> Is this the problem?
> Thank you,
> 
> - Jamie

Extending constraints using register subclasses

2009-05-11 Thread Jamie Prescott


Hi!
I wanted to add finer (one per) register subclasses, so that I can more finely 
control
the register placement inside the inline assembly.
These are the relevant definitions inside my include file:

enum reg_class
{
NO_REGS = 0,
GENERAL_REGS,
X_REGS,
R0_REG, R1_REG, R2_REG, R3_REG,
R4_REG, R5_REG, R6_REG, R7_REG,
X0_REG, X1_REG, X2_REG, X3_REG,
X4_REG, X5_REG, X6_REG, X7_REG,
ALL_REGS,
LIM_REG_CLASSES
};

#define N_REG_CLASSES ((int) LIM_REG_CLASSES)

/* Give names of register classes as strings for dump file.  */

#define REG_CLASS_NAMES \
{   \
"NO_REGS",  \
"GENERAL_REGS", \
"X_REGS",   \
"R0_REG", "R1_REG", "R2_REG", "R3_REG", \
"R4_REG", "R5_REG", "R6_REG", "R7_REG", \
"X0_REG", "X1_REG", "X2_REG", "X3_REG", \
"X4_REG", "X5_REG", "X6_REG", "X7_REG", \
"ALL_REGS", \
"LIM_REGS"  \
}

/* Define which registers fit in which classes.
   This is an initializer for a vector of HARD_REG_SET
   of length N_REG_CLASSES.  */

#define REG_CLASS_CONTENTS  \
{   \
{ 0x, 0x, 0x, 0x }, /* NO_REGS  
*/ \
{ 0x, 0x007f, 0x, 0x }, /* 
GENERAL_REGS */ \
{ 0x, 0xff80, 0x, 0x007f }, /* X_REGS   
*/ \
{ 0x0001, 0x, 0x, 0x }, /* R0_REG   
   */ \
{ 0x0002, 0x, 0x, 0x }, /* R1_REG   
   */ \
{ 0x0004, 0x, 0x, 0x }, /* R2_REG   
   */ \
{ 0x0008, 0x, 0x, 0x }, /* R3_REG   
   */ \
{ 0x0010, 0x, 0x, 0x }, /* R4_REG   
   */ \
{ 0x0020, 0x, 0x, 0x }, /* R5_REG   
   */ \
{ 0x0040, 0x, 0x, 0x }, /* R6_REG   
   */ \
{ 0x0080, 0x, 0x, 0x }, /* R7_REG   
   */ \
{ 0x, 0x0080, 0x, 0x }, /* X0_REG   
*/ \
{ 0x, 0x0100, 0x, 0x }, /* X1_REG   
*/ \
{ 0x, 0x0200, 0x, 0x }, /* X2_REG   
*/ \
{ 0x, 0x0400, 0x, 0x }, /* X3_REG   
*/ \
{ 0x, 0x0800, 0x, 0x }, /* X4_REG   
*/ \
{ 0x, 0x1000, 0x, 0x }, /* X5_REG   
*/ \
{ 0x, 0x2000, 0x, 0x }, /* X6_REG   
*/ \
{ 0x, 0x4000, 0x, 0x }, /* X7_REG   
*/ \
{ 0x, 0x, 0x, 0x007f }, /* ALL_REGS 
*/ \
}

/* The same information, inverted:
   Return the class number of the smallest class containing
   reg number REGNO.  This could be a conditional expression
   or could index an array.  */
#define REGNO_REG_CLASS(REGNO) xxx_regno_reg_class(REGNO)

/* The class value for index registers, and the one for base regs.  */
#define INDEX_REG_CLASS NO_REGS
#define BASE_REG_CLASS  GENERAL_REGS

#define CLASS_REG_SIZE(CLASS) xxx_class_reg_size(CLASS)

#define CONSTRAINT_LEN(C, STR)  \
STR)[0] == 'a' || (STR)[0] == 'b') && (STR)[1] >= '0' && (STR)[1] 
<= '7') ? 2: \
 DEFAULT_CONSTRAINT_LEN(C, STR))

/* Get reg_class from a letter and string, such as appears in the machine 
description.  */
#define REG_CLASS_FROM_CONSTRAINT(CHAR, STR)\
xxx_reg_class_from_constraint(CHAR, STR)


I then have this inside my main C file:

int
xxx_reg_class_from_constraint(int c, char const *str)
{
if (str[0] == 'a' && str[1] >= '0' && str[1] <= '7')
return R0_REG + (str[1] - '0');
if (str[0] == 'b' && str[1] >= '0' && str[1] <= '7')
return X0_REG + (str[1] - '0');

return c == 'r' ? GENERAL_REGS: (c == 'x' ? X_REGS: NO_REGS);
}

int
xxx_regno_reg_class(int regno)
{
if (regno >= 0 && regno <= 7)
return R0_REG + regno;
if (regno >= XXX_FIRST_XREG && regno <= XXX_FIRST_XREG + 7)
return X0_REG + (regno - XXX_FIRST_XREG);

return XXX_REG_32BIT(regno) ? GENERAL_REGS:
(XXX_REG_XREG(regno

Re: Extending constraints using register subclasses

2009-05-11 Thread Jamie Prescott


Thank you Andrew, I wasn't aware of that. Will be going that way.
Just out of curiosity, was there something flawed in the way I took before?
Meaning, could have been done that way, but my code was wrong somewhere?


 - Jamie



- Original Message 
> From: Andrew Pinski 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Monday, May 11, 2009 4:47:57 PM
> Subject: Re: Extending constraints using register subclasses
> 
> On Mon, May 11, 2009 at 4:45 PM, Jamie Prescott wrote:
> >
> > Hi!
> > I wanted to add finer (one per) register subclasses, so that I can more 
> > finely 
> control
> > the register placement inside the inline assembly.
> 
> You don't need that.
> You can just use asm("registername") on variables.
> like so:
> 
> int f(int a)
> {
>   register int r0 __asm__("r0");
>   asm("use %0": "+r"(r0) );
> }
> 
> Thanks,
> Andrew Pinski

Code generation problem with optimizations enabled

2009-05-11 Thread Jamie Prescott


Hi!
I have this little code that drives me crazy about the code generation (GCC 
4.3.3).

extern double tle_mk_inf(int);

double tle_exp(double dval)
{
if (dval == 0.0)
return 0.0;
if (dval > 1)
return tle_mk_inf(1);

return -1.2;
}


If I compiled it with -O2 or -O3, the insn output is missing an 'fcmp' 
('cmpdf'), like you can see below:


$LC0:
long   0
long   0
$LC1:
long   0
long   1086556160
$LC2:
long   858993459
long   -1074580685

_tle_exp:
pushFP
mov SP,FP
pushr8
fpush   x8
lea $LC0,r8
ldr.fr8,x8
fcmpx0,x8
jz $L4
lea $LC1,r8
ldr.fr8,x8
<< MISSING 'fcmp x0,x8'
ja $L5
lea $LC2,r8
ldr.fr8,x8
$L4:
fdmov   x8,x0
fpopx8
pop r8
mov FP,SP
pop FP
ret
$L5:
mov 1,r0
call   _tle_mk_inf
fpopx8
pop r8
mov FP,SP
pop FP
ret


If I disable the optimizations, everything is fine and the 'fcmp' is there.
Even with optimizations enabled, the RTL dump shows the missing 'cmpdf' present
and correctly recognized. It being:

(define_insn "cmpdf"
  [(set (cc0)
(compare (match_operand:DF 0 "fullreg_operand" "x")
 (match_operand:DF 1 "fullreg_operand" "x")))]
  ""
  "fcmp\t%0,%1"
  [
   (set_attr "length" "3")
   (set_attr "cc" "compare")
  ]
)


Any hints?



 - Jamie

Re: Code generation problem with optimizations enabled

2009-05-12 Thread Jamie Prescott


- Original Message 

> From: Jamie Prescott 
> To: gcc@gcc.gnu.org
> Sent: Monday, May 11, 2009 11:59:23 PM
> Subject: Code generation problem with optimizations enabled

> If I disable the optimizations, everything is fine and the 'fcmp' is there.
> Even with optimizations enabled, the RTL dump shows the missing 'cmpdf' 
> present
> and correctly recognized. It being:

What I noticed is that if I CC_STATUS_INIT (in xxx_notice_update_cc()) even for 
insn that
do not require it (that are almost all of them - being only cmp/fcmp/test that 
modify cc0),
cmpdf gets emitted regularly.
Normally all the insn but cmp/fcmp/test set "none" in their cc attribute, and
xxx_notice_update_cc() does nothing in that case.
While cmp and fcmp (that set the cc attribute to "compare") do CC_STATUS_INIT 
and
records DEST and SRC operands.
Am I doing it wrong?


- Jamie

Re: Code generation problem with optimizations enabled

2009-05-12 Thread Jamie Prescott


Thank you Paolo, I'll take a look at it.
Is there a reason why the fcmp insn was dropped with such implementation?


 - Jamie



- Original Message 
> From: Paolo Bonzini 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Tuesday, May 12, 2009 1:31:53 AM
> Subject: Re: Code generation problem with optimizations enabled
> 
> > What I noticed is that if I CC_STATUS_INIT (in xxx_notice_update_cc()) even 
> for insn that
> > do not require it (that are almost all of them - being only cmp/fcmp/test 
> > that 
> modify cc0),
> > cmpdf gets emitted regularly.
> 
> If so, you should not be using cc0, but a CCmode register instead.
> 
> See for example how the fr30 port implements compare-and-jump.  I'm
> currently committing a merge that changes a bit how the
> compare-and-jumps are realized; if you wait a few hours, you'll get a
> more up-to-date example.
> 
> Paolo

Re: Code generation problem with optimizations enabled

2009-05-12 Thread Jamie Prescott


- Original Message 

> From: Paolo Bonzini 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Tuesday, May 12, 2009 1:31:53 AM
> Subject: Re: Code generation problem with optimizations enabled
> 
> > What I noticed is that if I CC_STATUS_INIT (in xxx_notice_update_cc()) even 
> for insn that
> > do not require it (that are almost all of them - being only cmp/fcmp/test 
> > that 
> modify cc0),
> > cmpdf gets emitted regularly.
> 
> If so, you should not be using cc0, but a CCmode register instead.
> 
> See for example how the fr30 port implements compare-and-jump.  I'm
> currently committing a merge that changes a bit how the
> compare-and-jumps are realized; if you wait a few hours, you'll get a
> more up-to-date example.

Thanks Paolo, that worked great and simplified things quite a bit.


- Jamie

Re: Extending constraints using register subclasses

2009-05-13 Thread Jamie Prescott


> >> On Mon, May 11, 2009 at 4:45 PM, Jamie Prescott wrote:

> >> 
> >>> Hi!
> >>> I wanted to add finer (one per) register subclasses, so that I can more 
> finely 
> >> control
> >>> the register placement inside the inline assembly.
> >> 
> >> You don't need that.
> >> You can just use asm("registername") on variables.
> >> like so:
> >> 
> >> int f(int a)
> >> {
> >>  register int r0 __asm__("r0");
> >>  asm("use %0": "+r"(r0) );
> >> }
> >> 
> >> Thanks,
> >> Andrew Pinski
> 
> That works with gcc 3.x but causes trouble and won't work smooth with gcc 4.x.
> 
> Here are just two examples that will crash for target AVR:
> 
> // ice.c 
> typedef union
> {
> unsigned char  asByte[2];
> unsigned short asWord;
> } data16_t;
> 
> data16_t mac10 (data16_t);
> 
> void put_sfrac16 (data16_t q)
> {
> unsigned char i;
> for (i=0; i < 2; i++)
> {
> register data16_t digit asm ("r24");
> digit.asByte[1] = 0;
> 
> digit.asByte[0] = q.asByte[0];
> digit = mac10 (digit);
> q.asByte[0] = digit.asByte[0];
> }
> }
> //
> 
> compiling this with
> > avr-gcc ice.c -Os -S
> rund into ICE in fwprop:
> 
> ice.c: In function 'put_sfrac16':
> ice.c:21: internal compiler error: in propagate_rtx, at fwprop.c:469
> 
> > GNU C (WinAVR 20090313) version 4.3.2 (avr) compiled by GNU C
> > version 3.4.5 (mingw-vista special r3), GMP version 4.2.3,
> > MPFR version 2.4.0.
> 
> 
> The second example shows that global register variables are pretty much 
> useless 
> in gcc 4.x. I am using global registers in gcc 3.4.6 and it work smooth. 
> Without 
> global regs I would have to pay considerable performance penalties, so I am 
> still using avr-gcc 3.4.6 (which produces code that is both faster and 
> smaller 
> than code from any avr-gcc 4.x I tested so far: from 4.1 up to 4.5). Here it 
> is:
> 
> // bug.c ///
> // avr-gcc bug.c -Os -S -ffixed-2 -ffixed-3
> 
> register unsigned int currData asm ("r2");
> 
> // from 
> typedef unsigned int uint8_t __attribute__((__mode__(__QI__)));
> 
> // from 
> #define PINB (*(volatile uint8_t *)(0x16 + 0x20))
> #define PORTB (*(volatile uint8_t *)(0x18 + 0x20))
> 
> // from 
> void __vector_1 (void) __attribute__ ((signal,used,externally_visible));
> void __vector_1 (void)
> {
> // Nothing special, just an ISR that depends on currData
> if (currData & 0x4000)
> PORTB |= 2;
> else
> PORTB &= ~2;
> 
> currData = currData << 1;
> }
> 
> void foo (void)
> {
> for (;;)
> {
> while (PINB & 1);
> currData = 0x2123;
> while (!(PINB & 1));
> }
> }
> //
> 
> CSE decides that currData in foo is dead. As everyone can see, the produced 
> assembler does not touch the global register R3:R2. Making global reg vars 
> volatile is iseless (gcc goesn' even warn about that and gcc doesn't even 
> have a 
> flag to tag a global reg volatile).
> 
> There are just (conditional) branches left that jump around and poll SFRs:
> 
> foo:
> /* prologue: function */
> /* frame size = 0 */
> .L13:
> sbic 54-32,0 ;  12*sbix_branch[length = 2]
> rjmp .L13
> .L10:
> sbis 54-32,0 ;  22*sbix_branch[length = 2]
> rjmp .L10
> rjmp .L13 ;  50jump[length = 1]
> .sizefoo, .-foo
> 
> Global register variables can boost performance in embedded applications. 
> It's a 
> pitty that this is pretty much useless in "modern" gcc 4.x.

Oh, this is bad news :/
Is the problem only with global register allocations, or even with local ones?
Wouldn't the approach of having register subclasses and handling them with 
ad-hoc contraints
(like the road I took before) be a more universally working solution?


- Jamie

Re: Code generation problem with optimizations enabled

2009-05-13 Thread Jamie Prescott


- Original Message 
> From: Jim Wilson 
> To: Jamie Prescott 
> Cc: Paolo Bonzini ; gcc@gcc.gnu.org
> Sent: Wednesday, May 13, 2009 6:15:07 PM
> Subject: Re: Code generation problem with optimizations enabled
> 
> Jamie Prescott wrote:
> > Thank you Paolo, I'll take a look at it.
> > Is there a reason why the fcmp insn was dropped with such implementation?
> 
> The code that optimizes away redundant cc0 compares is in final.c, in 
> final_scan_insn().  It is about line 2310 in my tree, near the comment "Check 
> for redundant test and compare instructions".  Try stepping through that code 
> to 
> see what is going on.  Maybe you are setting cc_status.value1 but not 
> cc_status.value2?
> 
> In any case, as Paolo mentioned, you are better off not using cc0.  cc0 
> should 
> only be used if you have no other choice.  And even then, you still probably 
> shouldn't use it.  Every couple of years someone tries to eliminate the cc0 
> code, and eventually someone will succeed.

Thanks for the insight. But anyway, I took the path Paolo told me using CCmode 
and
now it's working nicely.


- Jamie

Re: Extending constraints using register subclasses

2009-05-14 Thread Jamie Prescott


- Original Message 

> From: Michael Meissner 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Thursday, May 14, 2009 11:38:18 AM
> Subject: Re: Extending constraints using register subclasses
> 
> On Mon, May 11, 2009 at 04:45:26PM -0700, Jamie Prescott wrote:
> > 
> > Hi!
> > I wanted to add finer (one per) register subclasses, so that I can more 
> > finely 
> control
> > the register placement inside the inline assembly.
> > These are the relevant definitions inside my include file:
> > 
> > enum reg_class
> > {
> > NO_REGS = 0,
> > GENERAL_REGS,
> > X_REGS,
> > R0_REG, R1_REG, R2_REG, R3_REG,
> > R4_REG, R5_REG, R6_REG, R7_REG,
> > X0_REG, X1_REG, X2_REG, X3_REG,
> > X4_REG, X5_REG, X6_REG, X7_REG,
> > ALL_REGS,
> > LIM_REG_CLASSES
> > };
> > 
> > #define N_REG_CLASSES ((int) LIM_REG_CLASSES)
> > 
> > /* Give names of register classes as strings for dump file.  */
> > 
> > #define REG_CLASS_NAMES \
> > {   \
> > "NO_REGS",  \
> > "GENERAL_REGS", \
> > "X_REGS",   \
> > "R0_REG", "R1_REG", "R2_REG", "R3_REG", \
> > "R4_REG", "R5_REG", "R6_REG", "R7_REG", \
> > "X0_REG", "X1_REG", "X2_REG", "X3_REG", \
> > "X4_REG", "X5_REG", "X6_REG", "X7_REG", \
> > "ALL_REGS", \
> > "LIM_REGS"  \
> > }
> > 
> > /* Define which registers fit in which classes.
> >This is an initializer for a vector of HARD_REG_SET
> >of length N_REG_CLASSES.  */
> > 
> > #define REG_CLASS_CONTENTS  \
> > {   \
> > { 0x, 0x, 0x, 0x }, /* 
> NO_REGS  */ \
> > { 0x, 0x007f, 0x, 0x }, /* 
> GENERAL_REGS */ \
> > { 0x, 0xff80, 0x, 0x007f }, /* 
> > X_REGS  
> */ \
> > { 0x0001, 0x, 0x, 0x }, /* 
> > R0_REG  
> */ \
> > { 0x0002, 0x, 0x, 0x }, /* 
> > R1_REG  
> */ \
> > { 0x0004, 0x, 0x, 0x }, /* 
> > R2_REG  
> */ \
> > { 0x0008, 0x, 0x, 0x }, /* 
> > R3_REG  
> */ \
> > { 0x0010, 0x, 0x, 0x }, /* 
> > R4_REG  
> */ \
> > { 0x0020, 0x, 0x, 0x }, /* 
> > R5_REG  
> */ \
> > { 0x0040, 0x, 0x, 0x }, /* 
> > R6_REG  
> */ \
> > { 0x0080, 0x, 0x, 0x }, /* 
> > R7_REG  
> */ \
> > { 0x, 0x0080, 0x, 0x }, /* 
> > X0_REG  
> */ \
> > { 0x, 0x0100, 0x, 0x }, /* 
> > X1_REG  
> */ \
> > { 0x, 0x0200, 0x, 0x }, /* 
> > X2_REG  
> */ \
> > { 0x, 0x0400, 0x, 0x }, /* 
> > X3_REG  
> */ \
> > { 0x, 0x0800, 0x, 0x }, /* 
> > X4_REG  
> */ \
> > { 0x, 0x1000, 0x, 0x }, /* 
> > X5_REG  
> */ \
> > { 0x, 0x2000, 0x, 0x }, /* 
> > X6_REG  
> */ \
> > { 0x, 0x4000, 0x, 0x }, /* 
> > X7_REG  
> */ \
> > { 0x, 0x, 0x, 0x007f }, /* 
> ALL_REGS */ \
> > }
> 
> In addition to the ordering GENERAL_REGS after the R/X_REGS, you didn't 
> mention
> defining IRA_COVER_CLASSES.  With the new IRA register allocator, you need to
> define a cover class for all registers that can be moved back and forth.  For
> your setup, I would imagine the following would work:
> 
> #define IRA_COVER_CLASS { GENERAL_REGS }
> 
> If you don't define an IRA_COVERT_CLASS, then the compiler assumes each
> register class is unique, and it can't copy between them.

Thanks Michael, I think that might be the culprit indeed!

- Jamie

Writing over [SP]

2009-05-14 Thread Jamie Prescott


This is driving me crazy. Any memory op that tries to write over [SP] get's 
automatically nuked
by the compiler.
Any offset, positive or negative, from SP, has no problems.
What did I do wrong this time?


 - Jamie

Re: Extending constraints using register subclasses

2009-05-14 Thread Jamie Prescott


- Original Message 

> From: Michael Meissner 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Thursday, May 14, 2009 11:38:18 AM
> Subject: Re: Extending constraints using register subclasses
> 
> On Mon, May 11, 2009 at 04:45:26PM -0700, Jamie Prescott wrote:
> > 
> > Hi!
> > I wanted to add finer (one per) register subclasses, so that I can more 
> > finely 
> control
> > the register placement inside the inline assembly.
> > These are the relevant definitions inside my include file:
> > 
> > enum reg_class
> > {
> > NO_REGS = 0,
> > GENERAL_REGS,
> > X_REGS,
> > R0_REG, R1_REG, R2_REG, R3_REG,
> > R4_REG, R5_REG, R6_REG, R7_REG,
> > X0_REG, X1_REG, X2_REG, X3_REG,
> > X4_REG, X5_REG, X6_REG, X7_REG,
> > ALL_REGS,
> > LIM_REG_CLASSES
> > };
> > 
> > #define N_REG_CLASSES ((int) LIM_REG_CLASSES)
> > 
> > /* Give names of register classes as strings for dump file.  */
> > 
> > #define REG_CLASS_NAMES \
> > {   \
> > "NO_REGS",  \
> > "GENERAL_REGS", \
> > "X_REGS",   \
> > "R0_REG", "R1_REG", "R2_REG", "R3_REG", \
> > "R4_REG", "R5_REG", "R6_REG", "R7_REG", \
> > "X0_REG", "X1_REG", "X2_REG", "X3_REG", \
> > "X4_REG", "X5_REG", "X6_REG", "X7_REG", \
> > "ALL_REGS", \
> > "LIM_REGS"  \
> > }
> > 
> > /* Define which registers fit in which classes.
> >This is an initializer for a vector of HARD_REG_SET
> >of length N_REG_CLASSES.  */
> > 
> > #define REG_CLASS_CONTENTS  \
> > {   \
> > { 0x, 0x, 0x, 0x }, /* 
> NO_REGS  */ \
> > { 0x, 0x007f, 0x, 0x }, /* 
> GENERAL_REGS */ \
> > { 0x, 0xff80, 0x, 0x007f }, /* 
> > X_REGS  
> */ \
> > { 0x0001, 0x, 0x, 0x }, /* 
> > R0_REG  
> */ \
> > { 0x0002, 0x, 0x, 0x }, /* 
> > R1_REG  
> */ \
> > { 0x0004, 0x, 0x, 0x }, /* 
> > R2_REG  
> */ \
> > { 0x0008, 0x, 0x, 0x }, /* 
> > R3_REG  
> */ \
> > { 0x0010, 0x, 0x, 0x }, /* 
> > R4_REG  
> */ \
> > { 0x0020, 0x, 0x, 0x }, /* 
> > R5_REG  
> */ \
> > { 0x0040, 0x, 0x, 0x }, /* 
> > R6_REG  
> */ \
> > { 0x0080, 0x, 0x, 0x }, /* 
> > R7_REG  
> */ \
> > { 0x, 0x0080, 0x, 0x }, /* 
> > X0_REG  
> */ \
> > { 0x, 0x0100, 0x, 0x }, /* 
> > X1_REG  
> */ \
> > { 0x, 0x0200, 0x, 0x }, /* 
> > X2_REG  
> */ \
> > { 0x, 0x0400, 0x, 0x }, /* 
> > X3_REG  
> */ \
> > { 0x, 0x0800, 0x, 0x }, /* 
> > X4_REG  
> */ \
> > { 0x, 0x1000, 0x, 0x }, /* 
> > X5_REG  
> */ \
> > { 0x, 0x2000, 0x, 0x }, /* 
> > X6_REG  
> */ \
> > { 0x, 0x4000, 0x, 0x }, /* 
> > X7_REG  
> */ \
> > { 0x, 0x, 0x, 0x007f }, /* 
> ALL_REGS */ \
> > }
> 
> In addition to the ordering GENERAL_REGS after the R/X_REGS, you didn't 
> mention
> defining IRA_COVER_CLASSES.  With the new IRA register allocator, you need to
> define a cover class for all registers that can be moved back and forth.  For
> your setup, I would imagine the following would work:
> 
> #define IRA_COVER_CLASS { GENERAL_REGS }
> 
> If you don't define an IRA_COVERT_CLASS, then the compiler assumes each
> register class is unique, and it can't copy between them.

OK, I tried reordering the classes by putting smaller ones first. Did not work.
As far as IRA_COVER_CLASS, this should be a new thing, isn't it? I'm currently
on 4.3.3 and I found no mention of it anywhere.


- Jamie

Re: Extending constraints using register subclasses

2009-05-14 Thread Jamie Prescott


- Original Message 
> From: Jamie Prescott 
> To: Michael Meissner 
> Cc: gcc@gcc.gnu.org
> Sent: Thursday, May 14, 2009 4:22:13 PM
> Subject: Re: Extending constraints using register subclasses
> OK, I tried reordering the classes by putting smaller ones first. Did not 
> work.
> As far as IRA_COVER_CLASS, this should be a new thing, isn't it? I'm currently
> on 4.3.3 and I found no mention of it anywhere.

OK, the problem is in reg_fits_class_p().

void dseek(unsigned int offset)
{
asm volatile ("dsk r56\n\t":: "a2" (offset));
}

The r56 register is part of GENERAL_REGS (0..64) and 'a2' is 'r2', that is a 
subclass (R2_REG)
of GENERAL_REGS containing only one register.
In recog.c, in the 'default' switch handling, there's a call to the 
REG_CLASS_FROM_CONSTRAINT()
that I hooked, to return R2_REG upon 'a2'.
That code ends up calling reg_fits_class_p() with 'operand' being the r0 REG 
RTX, and 'cl' being R2_REG.
Such function tests if r0 is inside R2_REG, that obviously isn't, and fails.


- Jamie

Re: Extending constraints using register subclasses

2009-05-14 Thread Jamie Prescott


- Original Message 
> From: Jamie Prescott 
> To: Jamie Prescott ; Michael Meissner 
> 
> Cc: gcc@gcc.gnu.org
> Sent: Thursday, May 14, 2009 5:36:14 PM
> Subject: Re: Extending constraints using register subclasses
> 
> - Original Message 
> > From: Jamie Prescott 
> > To: Michael Meissner 
> > Cc: gcc@gcc.gnu.org
> > Sent: Thursday, May 14, 2009 4:22:13 PM
> > Subject: Re: Extending constraints using register subclasses
> > OK, I tried reordering the classes by putting smaller ones first. Did not 
> work.
> > As far as IRA_COVER_CLASS, this should be a new thing, isn't it? I'm 
> > currently
> > on 4.3.3 and I found no mention of it anywhere.
> 
> OK, the problem is in reg_fits_class_p().
> 
> void dseek(unsigned int offset)
> {
> asm volatile ("dsk r56\n\t":: "a2" (offset));
> }
> 
> The r56 register is part of GENERAL_REGS (0..64) and 'a2' is 'r2', that is a 
> subclass (R2_REG)
> of GENERAL_REGS containing only one register.
> In recog.c, in the 'default' switch handling, there's a call to the 
> REG_CLASS_FROM_CONSTRAINT()
> that I hooked, to return R2_REG upon 'a2'.
> That code ends up calling reg_fits_class_p() with 'operand' being the r0 REG 
> RTX, and 'cl' being R2_REG.
> Such function tests if r0 is inside R2_REG, that obviously isn't, and fails.

Found the culprit. I mistakenly had r0..r7 inside FIXED (on top of CALL_USED)  
:|
Once I removed them from FIXED, everything seems to be working nicely and passed
a pretty exhaustive test suite.


- Jamie

Compact regsiter allocation

2009-05-14 Thread Jamie Prescott


The VM I'm retargeting GCC to, has an instruction that allows to store/load 
multiple,
a consecutive range of registers, to a memory operand.
I noticed that sometime the registers allocated by GCC are sparse, and this 
prevents
the store/load multiple optimization from happening (I have to issue single 
push/pop).
Is it possible in some way to instruct GCC to make the allocation compact?
So, for example, instead of allocating r8, r10, r12 and r15, allocate r8..r11?


 - Jamie

Re: Compact regsiter allocation

2009-05-14 Thread Jamie Prescott


- Original Message 

> From: Ian Lance Taylor 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Thursday, May 14, 2009 8:57:08 PM
> Subject: Re: Compact regsiter allocation
> 
> Jamie Prescott writes:
> 
> > The VM I'm retargeting GCC to, has an instruction that allows to store/load 
> multiple,
> > a consecutive range of registers, to a memory operand.
> > I noticed that sometime the registers allocated by GCC are sparse, and this 
> prevents
> > the store/load multiple optimization from happening (I have to issue single 
> push/pop).
> > Is it possible in some way to instruct GCC to make the allocation compact?
> > So, for example, instead of allocating r8, r10, r12 and r15, allocate 
> > r8..r11?
> 
> Normally gcc will allocate registers in the order they are listed in
> REG_ALLOC_ORDER, which defaults to increasing numeric order.  gcc won't
> normally allocate register sparsely.  That said, it is quite possible
> for gcc to allocate a register and then discover that it need not be
> allocated.  There isn't currently any way to request that gcc tighten up
> the register allocation.  That would probably require another
> optimization pass to consistently rename registers when there is a hole
> in the allocation order.

Yep, usually it allocates compact, but I noticed that when there's some inline 
assembly
with ad-hoc register naming, it starts generating holes.
The extra renaming pass, is it something available today?
If not, what is the best spot (in the normal GCC target hooks) to trigger it? 
Where the
full insn tree is passed in such hook (if it's not a global)?


- Jamie

Re: Compact regsiter allocation

2009-05-14 Thread Jamie Prescott


> From: Ian Lance Taylor 

> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Thursday, May 14, 2009 10:09:40 PM
> Subject: Re: Compact regsiter allocation
> 
> Jamie Prescott writes:
> 
> >> Normally gcc will allocate registers in the order they are listed in
> >> REG_ALLOC_ORDER, which defaults to increasing numeric order.  gcc won't
> >> normally allocate register sparsely.  That said, it is quite possible
> >> for gcc to allocate a register and then discover that it need not be
> >> allocated.  There isn't currently any way to request that gcc tighten up
> >> the register allocation.  That would probably require another
> >> optimization pass to consistently rename registers when there is a hole
> >> in the allocation order.
> >
> > Yep, usually it allocates compact, but I noticed that when there's some 
> > inline 
> assembly
> > with ad-hoc register naming, it starts generating holes.
> > The extra renaming pass, is it something available today?
> 
> No.
> 
> > If not, what is the best spot (in the normal GCC target hooks) to trigger 
> > it? 
> Where the
> > full insn tree is passed in such hook (if it's not a global)?
> 
> TARGET_MACHINE_DEPENDENT_REORG.  It will be invoked once for each
> function, and can access and modify the complete RTL insn tree.

Perfect, thanks!


- Jamie

Re: Compact regsiter allocation

2009-05-15 Thread Jamie Prescott

- Original Message 

> From: Ian Lance Taylor 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Thursday, May 14, 2009 10:09:40 PM
> Subject: Re: Compact regsiter allocation
> 
> Jamie Prescott writes:
> 
> > If not, what is the best spot (in the normal GCC target hooks) to trigger 
> > it? 
> Where the
> > full insn tree is passed in such hook (if it's not a global)?
> 
> TARGET_MACHINE_DEPENDENT_REORG.  It will be invoked once for each
> function, and can access and modify the complete RTL insn tree.

It wasn't as easy as I expected. Likely because my knowledge of GCC internals 
is pretty shallow.
This is what I came up with.
The target reorg is great, but it had two problems for me. One is that it was 
issued after the
prologue/epilogue, where I had to emit insns based on the already-remapped 
status, so I
ended up keeping as hash of non-remappable instructions that I fed with the 
xxx_emit_hashed_insn().
I use the hash to detect which insns do not have to be remapped.
Not only, since REG RTXs can appear multiple times, I need to feed the RTX into 
the hash, after
having been remapped once.
This seems to be working fine, but my biggest doubt is that way I dig into the 
single instructions to
find REG RTXs to remap.
I looked around and I ended up with the code below, partially stolen from 
print_rtx().
Is there a public/non-hackish API to enum all the RTXs contained inside an 
instruction?
If not, do I handle all the possible cases in xxx_enum_rtx()?
Seem code below, with xxx_reorg() being my reorg function ...

- Jamie

typedef struct s_xxx_reg_remap
{
int from, to;
} xxx_reg_remap_t;

static void
xxx_remap_reg(rtx insn, xxx_reg_remap_t const *rmap, int n)
{
int i, regno = REGNO(insn);

if (xxx_ptrhash_lookup(&insns_hash, (void *) insn, NULL))
return;

for (i = 0; i < n; i++)
if (rmap[i].from == regno) {
xxx_ptrhash_add(&insns_hash, (void *) insn);
SET_REGNO(insn, rmap[i].to);
break;
}
}

static void
xxx_enum_rtx(rtx insn, xxx_reg_remap_t const *rmap, int n,
 void (*proc)(rtx, xxx_reg_remap_t const *, int))
{
int i, j, len, vlen;
char const *fmt;
rtx x, y;

if (xxx_ptrhash_lookup(&insns_hash, (void *) insn, NULL))
return;

fmt = GET_RTX_FORMAT(GET_CODE(insn));
len = GET_RTX_LENGTH(GET_CODE(insn));
for (i = 0; i < len; i++, fmt++) {
switch (*fmt) {
case 'e':
if ((x = XEXP(insn, i)) == NULL_RTX)
continue;
if (GET_CODE(x) == REG)
(*proc)(x, rmap, n);
else
xxx_enum_rtx(x, rmap, n, proc);
break;

case 'E':
case 'V':
if (XVEC(insn, i) == NULL)
break;
vlen = XVECLEN(insn, i);
for (j = 0; j < vlen; j++) {
if ((y = XVECEXP(insn, i, j)) != NULL_RTX)
xxx_enum_rtx(y, rmap, n, proc);
}
break;
}
}
}

static void
xxx_reorg(void)
{
rtx insn;

insn = get_insns();
gcc_assert(GET_CODE(insn) == NOTE);

for (insn = next_nonnote_insn(insn); insn != NULL_RTX;
 insn = next_nonnote_insn(insn)) {
xxx_enum_rtx(insn, func_rmap, func_nrmap, xxx_remap_reg);
}

xxx_ptrhash_free(&insns_hash);
}

static rtx
xxx_emit_hashed_insn(rtx insn)
{
xxx_ptrhash_add(&insns_hash, (void *) insn);

return emit_insn(insn);
}

4.4 API changes

2009-05-15 Thread Jamie Prescott


I managed to migrate my code to 4.4, but I've some code I cannot figure out
how to translate.
Before, I was building the TRUE and FALSE  instruction list using 
gimplify_and_add(),
and then a:

build3(COND_EXPR, void_type_node, cond, a_case, b_case);

Where 'a_case' and 'b_case' were tree's.
Now I build two gimple_seq's in the same way, but I cannot see how to pass them 
to build3().
Is there an API to get the first tree out of the gimple_seq? Or, is there a 
build3() equivalent
that accepts gimple_seq's?


 - Jamie

Re: Writing over [SP]

2009-05-15 Thread Jamie Prescott


> From: Diego Novillo 

> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Friday, May 15, 2009 2:44:23 PM
> Subject: Re: Writing over [SP]
> 
> On Thu, May 14, 2009 at 18:18, Jamie Prescott wrote:
> >
> > This is driving me crazy. Any memory op that tries to write over [SP] get's 
> automatically nuked
> > by the compiler.
> > Any offset, positive or negative, from SP, has no problems.
> > What did I do wrong this time?
> 
> You need to provide a *whole* lot more information if you want us to
> be able to help you.  Input program, compile flags, changes you've
> made to the compiler...

I made no changes on GCC out of my TARGET domain.
An instruction like this one:

insn = gen_move_insn(gen_rtx_MEM(SImode,
   
plus_constant(stack_pointer_rtx,

 offset)), 
 gen_rtx_REG(SImode, reg));
emit_insn(insn);

Does not emit anything if "offset" == 0, while works for every offset != 0.
Instructions are just discarded.
Will investigate deeper on this as soon as I completed the port to 4.4, that is
currently giving me some headache.


- Jamie

Re: 4.4 API changes

2009-05-15 Thread Jamie Prescott


> From: Diego Novillo 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Friday, May 15, 2009 2:40:15 PM
> Subject: Re: 4.4 API changes
> 
> On Fri, May 15, 2009 at 17:23, Jamie Prescott wrote:
> 
> > I managed to migrate my code to 4.4, but I've some code I cannot figure out
> > how to translate.
> > Before, I was building the TRUE and FALSE  instruction list using 
> gimplify_and_add(),
> > and then a:
> >
> > build3(COND_EXPR, void_type_node, cond, a_case, b_case);
> 
> Are you trying to generate gimple or generic?  Both are different data
> structures now.

This is the varargs code, and I currently solved it by using 
append_to_statement_list(),
and then adding the resulting tree to the pre_p and post_p using 
gimplify_and_add().
Is it OK?


- Jamie

Different reload behavior from 4.3.3 to 4.4

2009-05-15 Thread Jamie Prescott


In my VM, the X_REGS class is a generic 64bit regsiter class that can hold
both 64bit DImode and 64bit DFmode.
Such class does not allow direct constant loading, so in 4.3.3 I had:

enum reg_class
xxx_preferred_reload_class(rtx x, enum reg_class regclass)
{
enum machine_mode mode = GET_MODE(x);

if (regclass == NO_REGS ||
((GET_CODE(x) == CONST_INT ||
  GET_CODE(x) == CONST_DOUBLE) && regclass == X_REGS))
return NO_REGS;
switch (mode) {
case DImode:
case SFmode:
case DFmode:
return X_REGS;
default:
break;
}

return regclass;
}

This was working fine with 4.3.3, while with 4.4 I get an error like if GCC was 
trying
to load a const_int directly anyway:

./csrc/test_sha512.c:294: error: insn does not satisfy its constraints:
(insn 870 869 43 2 ./csrc/test_sha512.c:200 (set (reg:DI 55 x0)
(const_int 7640891576956012808 [0x6a09e667f3bcc908])) 31 {movdi} (nil))
./csrc/test_sha512.c:294: internal compiler error: in 
reload_cse_simplify_operands, at postreload.c:396


Any hints?


 - Jamie

Re: Different reload behavior from 4.3.3 to 4.4

2009-05-15 Thread Jamie Prescott


> From: Jamie Prescott 
> To: gcc@gcc.gnu.org
> Sent: Friday, May 15, 2009 3:28:18 PM
> Subject: Different reload behavior from 4.3.3 to 4.4
> 
> 
> In my VM, the X_REGS class is a generic 64bit regsiter class that can hold
> both 64bit DImode and 64bit DFmode.
> Such class does not allow direct constant loading, so in 4.3.3 I had:
> 
> enum reg_class
> xxx_preferred_reload_class(rtx x, enum reg_class regclass)
> {
> enum machine_mode mode = GET_MODE(x);
> 
> if (regclass == NO_REGS ||
> ((GET_CODE(x) == CONST_INT ||
>   GET_CODE(x) == CONST_DOUBLE) && regclass == X_REGS))
> return NO_REGS;
> switch (mode) {
> case DImode:
> case SFmode:
> case DFmode:
> return X_REGS;
> default:
> break;
> }
> 
> return regclass;
> }
> 
> This was working fine with 4.3.3, while with 4.4 I get an error like if GCC 
> was 
> trying
> to load a const_int directly anyway:
> 
> ./csrc/test_sha512.c:294: error: insn does not satisfy its constraints:
> (insn 870 869 43 2 ./csrc/test_sha512.c:200 (set (reg:DI 55 x0)
> (const_int 7640891576956012808 [0x6a09e667f3bcc908])) 31 {movdi} 
> (nil))
> ./csrc/test_sha512.c:294: internal compiler error: in 
> reload_cse_simplify_operands, at postreload.c:396

Strange thing, this works:

<<<<<<<<
extern int loola(long long);

int noola(void)
{
return loola(19291291912LL);
}

>>>>>>>>
$LC0:
qword   19291291912
_noola:
pushr8
lea $LC0,r8
ldr.qr8,x0
call_loola
pop r8
ret


While this chokes:

<<<<<<<<<
struct ff {
long long ll[4];
};

void noola(struct ff *p)
{
p->ll[0] = 19291291912LL;
}

>>>>>>>>>
ainl.c: In function 'noola':
ainl.c:11: error: insn does not satisfy its constraints:
(insn 17 16 6 2 ainl.c:10 (set (reg:DI 55 x0)
(const_int 19291291912 [0x47dd9c108])) 31 {movdi} (nil))
ainl.c:11: internal compiler error: in reload_cse_simplify_operands, at 
postreload.c:396

Re: Different reload behavior from 4.3.3 to 4.4

2009-05-15 Thread Jamie Prescott


> From: Jamie Prescott 
> To: gcc@gcc.gnu.org
> Sent: Friday, May 15, 2009 4:16:30 PM
> Subject: Re: Different reload behavior from 4.3.3 to 4.4
> 
> 
> > From: Jamie Prescott 
> > To: gcc@gcc.gnu.org
> > Sent: Friday, May 15, 2009 3:28:18 PM
> > Subject: Different reload behavior from 4.3.3 to 4.4
> > 
> > 
> > In my VM, the X_REGS class is a generic 64bit regsiter class that can hold
> > both 64bit DImode and 64bit DFmode.
> > Such class does not allow direct constant loading, so in 4.3.3 I had:
> > 
> > enum reg_class
> > xxx_preferred_reload_class(rtx x, enum reg_class regclass)
> > {
> > enum machine_mode mode = GET_MODE(x);
> > 
> > if (regclass == NO_REGS ||
> > ((GET_CODE(x) == CONST_INT ||
> >   GET_CODE(x) == CONST_DOUBLE) && regclass == X_REGS))
> > return NO_REGS;
> > switch (mode) {
> > case DImode:
> > case SFmode:
> > case DFmode:
> > return X_REGS;
> > default:
> > break;
> > }
> > 
> > return regclass;
> > }
> > 
> > This was working fine with 4.3.3, while with 4.4 I get an error like if GCC 
> was 
> > trying
> > to load a const_int directly anyway:
> > 
> > ./csrc/test_sha512.c:294: error: insn does not satisfy its constraints:
> > (insn 870 869 43 2 ./csrc/test_sha512.c:200 (set (reg:DI 55 x0)
> > (const_int 7640891576956012808 [0x6a09e667f3bcc908])) 31 {movdi} 
> (nil))
> > ./csrc/test_sha512.c:294: internal compiler error: in 
> > reload_cse_simplify_operands, at postreload.c:396
> 
> Strange thing, this works:
> 
> <<<<<<<<
> extern int loola(long long);
> 
> int noola(void)
> {
> return loola(19291291912LL);
> }
> 
> >>>>>>>>
> $LC0:
> qword   19291291912
> _noola:
> pushr8
> lea $LC0,r8
> ldr.qr8,x0
> call_loola
> pop r8
> ret
> 
> 
> While this chokes:
> 
> <<<<<<<<<
> struct ff {
> long long ll[4];
> };
> 
> void noola(struct ff *p)
> {
> p->ll[0] = 19291291912LL;
> }
> 
> >>>>>>>>>
> ainl.c: In function 'noola':
> ainl.c:11: error: insn does not satisfy its constraints:
> (insn 17 16 6 2 ainl.c:10 (set (reg:DI 55 x0)
> (const_int 19291291912 [0x47dd9c108])) 31 {movdi} (nil))
> ainl.c:11: internal compiler error: in reload_cse_simplify_operands, at 
> postreload.c:396

This is the output of the IRA pass:

Reloads for insn # 6
Reload 0: reload_in (SI) = (symbol_ref/u:SI ("*$LC0") [flags 0x2])
GENERAL_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 1)
reload_in_reg: (symbol_ref/u:SI ("*$LC0") [flags 0x2])
reload_reg_rtx: (reg:SI 8 r8)
Reload 1: reload_out (DI) = (mem/s:DI (reg/v/f:SI 0 r0 [orig:124 p ] [124]) [3 
.ll+0 S8 A64])
NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
reload_out_reg: (mem/s:DI (reg/v/f:SI 0 r0 [orig:124 p ] [124]) [3 
.ll+0 S8 A64])
Reload 2: reload_in (DI) = (mem/u/c/i:DI (symbol_ref/u:SI ("*$LC0") [flags 
0x2]) [3 S8 A64])
X_REGS, RELOAD_FOR_INPUT (opnum = 1), can't combine
reload_in_reg: (const_int 19291291912 [0x47dd9c108])
reload_reg_rtx: (reg:DI 55 x0)
deleting insn with uid = 2.


It created the label ($LC0) where to store the DI value, but then it gets a 
"can't combine"
and drops it. And it goes for a reload_in_reg directly.
Even from the insns, it generates the load of the $LC0 symbol address to r8 for 
a reload
that never happens.
Instead, a move-CONST_INT-to-reg is generated, that 'movdi' can't handle.

(insn 16 3 17 2 ainl.c:10 (set (reg:SI 8 r8)
(symbol_ref/u:SI ("*$LC0") [flags 0x2])) 28 {*movsi_internal} (nil))

(insn 17 16 6 2 ainl.c:10 (set (reg:DI 55 x0)
(const_int 19291291912 [0x47dd9c108])) 31 {movdi} (nil))

(insn 6 17 14 2 ainl.c:10 (set (mem/s:DI (reg/v/f:SI 0 r0 [orig:124 p ] [124]) 
[3 .ll+0 S8 A64])
(reg:DI 55 x0)) 31 {movdi} (nil))

Any suggestions, or things that changed in 4.4 vs. 4.3.3 I should be aware of, 
are very
much welcome ...


- Jamie

Re: Different reload behavior from 4.3.3 to 4.4

2009-05-15 Thread Jamie Prescott


> From: Jamie Prescott 
> To: gcc@gcc.gnu.org
> Sent: Friday, May 15, 2009 3:28:18 PM
> Subject: Different reload behavior from 4.3.3 to 4.4
> 
> 
> In my VM, the X_REGS class is a generic 64bit regsiter class that can hold
> both 64bit DImode and 64bit DFmode.
> Such class does not allow direct constant loading, so in 4.3.3 I had:
> 
> enum reg_class
> xxx_preferred_reload_class(rtx x, enum reg_class regclass)
> {
> enum machine_mode mode = GET_MODE(x);
> 
> if (regclass == NO_REGS ||
> ((GET_CODE(x) == CONST_INT ||
>   GET_CODE(x) == CONST_DOUBLE) && regclass == X_REGS))
> return NO_REGS;
> switch (mode) {
> case DImode:
> case SFmode:
> case DFmode:
> return X_REGS;
> default:
> break;
> }
> 
> return regclass;
> }
> 
> This was working fine with 4.3.3, while with 4.4 I get an error like if GCC 
> was 
> trying
> to load a const_int directly anyway:
> 
> ./csrc/test_sha512.c:294: error: insn does not satisfy its constraints:
> (insn 870 869 43 2 ./csrc/test_sha512.c:200 (set (reg:DI 55 x0)
> (const_int 7640891576956012808 [0x6a09e667f3bcc908])) 31 {movdi} 
> (nil))
> ./csrc/test_sha512.c:294: internal compiler error: in 
> reload_cse_simplify_operands, at postreload.c:396

Found it.
The reload function were called with X0_REG, that is a class containing only 
X0, and is
in turn a subclass of X_REGS.
So the test had to be changed to a REG_CLASS_X(rcls), to account all the classes
storing 64bit registers.
This was a bitch to diagnose.


enum reg_class
xxx_preferred_reload_class(rtx x, enum reg_class regclass)
{
enum machine_mode mode = GET_MODE(x);

if (regclass == NO_REGS ||
((GET_CODE(x) == CONST_INT ||
  GET_CODE(x) == CONST_DOUBLE) && REG_CLASS_X(regclass)))
return NO_REGS;
switch (mode) {
case DImode:
case SFmode:
case DFmode:
return REG_CLASS_X(regclass) ? regclass: X_REGS;
default:
break;
}

return regclass;
}


- Jamie

Re: Extending constraints using register subclasses

2009-05-16 Thread Jamie Prescott

> From: Andrew Pinski 

> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Monday, May 11, 2009 4:47:57 PM
> Subject: Re: Extending constraints using register subclasses
> 
> On Mon, May 11, 2009 at 4:45 PM, Jamie Prescott wrote:
> >
> > Hi!
> > I wanted to add finer (one per) register subclasses, so that I can more 
> > finely 
> control
> > the register placement inside the inline assembly.
> 
> You don't need that.
> You can just use asm("registername") on variables.
> like so:
> 
> int f(int a)
> {
>   register int r0 __asm__("r0");
>   asm("use %0": "+r"(r0) );
> }

Now I managed to have the approach based on register subclasses working. The 
above
works too, but I somehow found it less clear and more "global" than inline 
assembly
constraints.
One thing that I cannot see how to implement clearly with the approach above, 
is when
you have an instruction (like I use inside the VM with SWI - software 
interrupts), where
the C type of a register changes between input and output.
For some of them the r0 in input can be a 'char*', and an 'int' (error code) on 
output.
With inline assembly this can be done pretty cleanly, while I am not aware of 
how to
do it with the approach above. For clean, I mean cast-less.

- Jamie

Re: Extending constraints using register subclasses

2009-05-16 Thread Jamie Prescott


> From: Andrew Pinski 

> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Saturday, May 16, 2009 8:04:59 AM
> Subject: Re: Extending constraints using register subclasses
> 
> On Sat, May 16, 2009 at 7:57 AM, Jamie Prescott wrote:
> > Now I managed to have the approach based on register subclasses working. 
> > The 
> above
> > works too, but I somehow found it less clear and more "global" than inline 
> assembly
> > constraints.
> 
> It is not global as the register variables don't escape out of the
> scope.  If you use them in inline functions, it makes them less global
> and also easier to read too.

And how would you cleanly solve a case like this?

int swi_open_file(char const *path, int mode)
{
int error;

asm ("swi 32\n\t" : "=r0" (error) : "r0" (path), "r1" (mode));

return error;
}


- Jamie

nops

2009-05-20 Thread Jamie Prescott


Under which conditions GCC generates nops?
I noticed that with 4.4.0, gen_nop() is required, thing that wasn't with 4.3.3.
Can I just define an empty insn for nop, of GCC requires a one-byte insn for its
own alignment purposes?


 - Jamie

Re: nops

2009-05-21 Thread Jamie Prescott

- Original Message 

> From: Ian Lance Taylor 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Wednesday, May 20, 2009 9:50:50 PM
> Subject: Re: nops
> 
> Jamie Prescott writes:
> 
> > Under which conditions GCC generates nops?
> 
> It depends entirely on the target.  For many targets, gcc will never
> generate a nop instruction, except as a byproduct of alignment.

My target does not have anything special WRT alignment. I even set the function
alignment to 8, and it still issues gen_nop().
This seem to happen only with -O0, or at least I noticed it only under such 
condition
so far.

> > I noticed that with 4.4.0, gen_nop() is required, thing that wasn't with 
> 4.3.3.
> > Can I just define an empty insn for nop, of GCC requires a one-byte insn 
> > for 
> its
> > own alignment purposes?
> 
> Normally gcc does alignment by issuing an assembler directive, and the
> assembler is responsible for generating nop instructions when necessary.

Did a quick grep and the one you mention (ASM_OUTPUT_ALIGN_WITH_NOP) is
not defined in my backend. And such directive does not even have a default.
So, in theory, I should not see any nops.
The problem is different. The error is at link-time, if gen_nop() is not 
defined, so it
means that the insn is generated programmatically.
Grepping the source, I noticed a few files that issue gen_nop() directly, thing 
that
was not happening with 4.3.3.
So my questions are. Is there a way to disable it? If not, can I define an 
empty-issue
instruction for 'nop'?

- Jamie

Re: nops

2009-05-21 Thread Jamie Prescott


> From: Andrew Pinski 

> To: Jamie Prescott 
> Cc: Ian Lance Taylor ; gcc@gcc.gnu.org
> Sent: Thursday, May 21, 2009 8:22:00 AM
> Subject: Re: nops
> 
> On Thu, May 21, 2009 at 8:13 AM, Jamie Prescott wrote:
> > My target does not have anything special WRT alignment. I even set the 
> function
> > alignment to 8, and it still issues gen_nop().
> > This seem to happen only with -O0, or at least I noticed it only under such 
> condition
> > so far.
> 
> so it happens at -O0 so that the debugger is able to place a
> breakpoint at some gotos.

Thanks! Makes sense, and it means that I can define to an empty insn for the
version of the VM that does not have 'nop'.


- Jamie

Seeking suggestion

2009-05-22 Thread Jamie Prescott


Suppose you're writing the backend for a VM supporting two architectures, in 
which
one of them clobbers the CC registers for certain instructions, while the other 
does not.
The instructions themselves are exactly the same.
What is the best/shortest/more-elegant way to write this, possibly w/out 
duplicating the
instructions?
I know I can write a define_expand and redirect, based on the TARGET, to two 
different
instructions (one with "clobber", the other w/out), but that's basically three 
declarations
for each insns. Is there a shorter way?


 - Jamie

Re: Seeking suggestion

2009-05-22 Thread Jamie Prescott


> From: Jamie Prescott 
> To: gcc@gcc.gnu.org
> Sent: Friday, May 22, 2009 10:36:47 AM
> Subject: Seeking suggestion
> 
> 
> Suppose you're writing the backend for a VM supporting two architectures, in 
> which
> one of them clobbers the CC registers for certain instructions, while the 
> other 
> does not.
> The instructions themselves are exactly the same.
> What is the best/shortest/more-elegant way to write this, possibly w/out 
> duplicating the
> instructions?
> I know I can write a define_expand and redirect, based on the TARGET, to two 
> different
> instructions (one with "clobber", the other w/out), but that's basically 
> three 
> declarations
> for each insns. Is there a shorter way?

I ended up doing something like this (long way, but the only one I know of).
Example, for addsi3:

(define_insn "addsi3_xxx2"
  [(set (match_operand:SI 0 "fullreg_operand" "=r,r")
(plus:SI (match_operand:SI 1 "fullreg_operand" "0,r")
 (match_operand:SI 2 "fullreg_or_imm_operand" "rn,rn")))]
  ""
  "@
   add\t%0,%2,%0
   add\t%1,%2,%0"
)

(define_insn "addsi3_xxx"
  [(set (match_operand:SI 0 "fullreg_operand" "=r,r")
(plus:SI (match_operand:SI 1 "fullreg_operand" "0,r")
 (match_operand:SI 2 "fullreg_or_imm_operand" "rn,rn")))
   (clobber (reg:CC CC_REG))]
  ""
  "@
   add\t%0,%2,%0
   add\t%1,%2,%0"
)

(define_expand "addsi3"
  [(set (match_operand:SI 0 "fullreg_operand" "=r,r")
(plus:SI (match_operand:SI 1 "fullreg_operand" "0,r")
 (match_operand:SI 2 "fullreg_or_imm_operand" "rn,rn")))]
  ""
  {
if (!TARGET_XXX2)
  emit_insn(gen_addsi3_xxx(operands[0], operands[1], operands[2]));
else
  emit_insn(gen_addsi3_xxx2(operands[0], operands[1], operands[2]));
DONE;
  }
)


But now I get and invalid rtx sharing from the push/pop parallels:


.c: In function 'test_dashr':
.c:32: error: invalid rtl sharing found in the insn
(insn 26 3 28 2 .c:26 (parallel [
(insn/f 25 0 0 (set (reg/f:SI 51 SP)
(minus:SI (reg/f:SI 51 SP)
(const_int 4 [0x4]))) -1 (nil))
(set/f (mem:SI (reg/f:SI 51 SP) [0 S4 A8])
(reg:SI 8 r8))
]) -1 (nil))
.c:32: error: shared rtx
(insn/f 25 0 0 (set (reg/f:SI 51 SP)
(minus:SI (reg/f:SI 51 SP)
(const_int 4 [0x4]))) -1 (nil))
.c:32: internal compiler error: internal consistency failure


Any insights?
Thanks,


- Jamie

Re: Seeking suggestion

2009-05-22 Thread Jamie Prescott

> From: Ian Lance Taylor 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Friday, May 22, 2009 5:45:21 PM
> Subject: Re: Seeking suggestion
> 
> Jamie Prescott writes:
> 
> > But now I get and invalid rtx sharing from the push/pop parallels:
> 
> This normally means that you need a copy_rtx somewhere.  Different insns
> may not share data structure.

Ok, fixed. I was generating the parallel for push/pop by calling directly 
gen_addsi3.
This eventually was generating that problem with the code I posted in the 
previous
message.
Once I open coded the addsi3 with gen_rtx_SET(gen_rtx_PLUS()), the error went
away.
Is the implementation I posted the only one, or there are shorter/better ones?

- Jamie

Re: Seeking suggestion

2009-05-23 Thread Jamie Prescott


> From: Georg-Johann Lay 
> To: Jamie Prescott 
> Cc: Ian Lance Taylor ; gcc@gcc.gnu.org
> Sent: Saturday, May 23, 2009 12:05:09 AM
> Subject: Re: Seeking suggestion
> 
> Jamie Prescott schrieb:
> 
> > Is the implementation I posted the only one, or there are shorter/better 
> > ones?
> 
> You could use insn attribute to express insns' effects on cc_status.
> Have a look at the avr backend.

AVR uses CC0, while I just switched to CCmode.
Is there a reason why something like this would not work?

(define_insn "addsi3_nc"
  [(set (match_operand:SI 0 "fullreg_operand" "=r")
(plus:SI (match_operand:SI 1 "fullreg_operand" "r")
 (match_operand:SI 2 "fullreg_or_imm_operand" "rn")))]
  ""
  "..."
)

(define_expand "addsi3"
  [(set (match_operand:SI 0 "fullreg_operand" "=r")
(plus:SI (match_operand:SI 1 "fullreg_operand" "r")
 (match_operand:SI 2 "fullreg_or_imm_operand" "rn")))]
  ""
  {
if (!TARGET_XXX2)
  emit_clobber(gen_rtx_REG(CCmode, CC_REGNUM));
emit_insn(gen_addsi3_nc(operands[0], operands[1], operands[2]));
DONE;
  }
)

That would limit to two instructions per basic insns, instead of the current 
three.
Thanks,

- Jamie

Re: Seeking suggestion

2009-05-25 Thread Jamie Prescott


> From: Michael Meissner 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Sunday, May 24, 2009 1:57:19 PM
> Subject: Re: Seeking suggestion
> 
> One way is to use match_scratch, and different register classes for the two
> cases.
> 
> (define_insn "add3"
>   [(set (match_operand:SI 0 "register_operand" "=x,y")
> (plus:SI (match_operand:SI 1 "register_operand" "%x,y")
>  (match_operand:SI 2 "register_operand" "x,y")))
>(clobber (match_scratch:CC 3 "=X,z"))]
>   ""
>   "add %0,%1,%2")
> 
> 
> (define_register_constraint "x" "TARGET_MACHINE ? GENERAL_REGS : NO_REGS"
>   "@internal")
> 
> (define_register_constraint "y" "!TARGET_MACHINE ? GENERAL_REGS : NO_REGS"
>   "@internal")
> 
> (define_register_constraint "z" CR_REGS "@interal")
> 
> This assumes you have a register class for the condition code register.  Most
> machines however, use the normal define_expand with two different insns.
> 
> In theory, you could define a second condition code register that doesn't
> actually exist in the machine, and change the clobber from the main CC to the
> fake one.

Hmm, interesting. Thanks Michael.
Though, as you were saying, I'll probably leave them as separate insns. These 
should
have been two separate targets probably, but I'm too lazy to split them up ATM.



> > But now I get and invalid rtx sharing from the push/pop parallels:
> > 
> > 
> > .c: In function 'test_dashr':
> > .c:32: error: invalid rtl sharing found in the insn
> > (insn 26 3 28 2 .c:26 (parallel [
> > (insn/f 25 0 0 (set (reg/f:SI 51 SP)
> > (minus:SI (reg/f:SI 51 SP)
> > (const_int 4 [0x4]))) -1 (nil))
> > (set/f (mem:SI (reg/f:SI 51 SP) [0 S4 A8])
> > (reg:SI 8 r8))
> > ]) -1 (nil))
> > .c:32: error: shared rtx
> > (insn/f 25 0 0 (set (reg/f:SI 51 SP)
> > (minus:SI (reg/f:SI 51 SP)
> > (const_int 4 [0x4]))) -1 (nil))
> > .c:32: internal compiler error: internal consistency failure
> 
> I suspect you don't have the proper guards on the push/pop insns, and the
> combiner is eliminating the clobber.  You probably need to have parallel insns
> for the push and pop.

Dunno exactly what was happening. The push/pop were generated with a parallel,
but I was issuing a gen_addsi3() directly, and this somehow was creating the 
problem.
Once I open coded that with SET(SP, PLUS(SP, SIZE)), the issue disappeared.


- Jamie

Re: Seeking suggestion

2009-05-27 Thread Jamie Prescott


> From: Jim Wilson 
> To: Jamie Prescott 
> Cc: Georg-Johann Lay ; Ian Lance Taylor ; 
> gcc@gcc.gnu.org
> Sent: Tuesday, May 26, 2009 7:47:45 PM
> Subject: Re: Seeking suggestion
> 
> Jamie Prescott wrote:
> > Is there a reason why something like this would not work?
> > if (!TARGET_XXX2)
> >   emit_clobber(gen_rtx_REG(CCmode, CC_REGNUM));
> > emit_insn(gen_addsi3_nc(operands[0], operands[1], operands[2]));
> 
> Yes.  The optimizer will not know that addsi3_nc uses CC_REGNUM, as it is not 
> mentioned, so the optimizer will not know that these two RTL instructions 
> always 
> need to remain next to each other.  Any optimization pass that moves insns 
> around may separate the add from the clobber resulting in broken code.  This 
> is 
> what parallels are for, to make sure that the clobber and add stay together.

Thanks for the explanation. I somehow thought that every insn spit out by a 
define_insn
was automatically turned into a parallel.
I guess I was wrong.


- Jamie

Re: Seeking suggestion

2009-05-27 Thread Jamie Prescott


> From: Jamie Prescott 
> To: Jim Wilson 
> Cc: Georg-Johann Lay ; Ian Lance Taylor ; 
> gcc@gcc.gnu.org
> Sent: Wednesday, May 27, 2009 10:12:42 AM
> Subject: Re: Seeking suggestion
>
> Thanks for the explanation. I somehow thought that every insn spit out by a 
> define_insn
> was automatically turned into a parallel.

Packed into a parallel, was more what I meant to imply.


- Jamie

Re: Seeking suggestion

2009-05-27 Thread Jamie Prescott


> From: Eric Botcazou 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org; Jim Wilson ; Georg-Johann Lay 
> ; Ian Lance Taylor 
> Sent: Wednesday, May 27, 2009 10:37:24 AM
> Subject: Re: Seeking suggestion
> 
> > Thanks for the explanation. I somehow thought that every insn spit out by a
> > define_insn was automatically turned into a parallel.
> 
> That's true, the template of a define_insn is automatically wrapped up in a 
> PARALLEL.  But your addsi3 is a define_expand and this works differently.

Oh, OK. Thanks. So in case of a define_expand I have to manually pack the
multiple insns into a parallel, if I want them to stick together, right?


- Jamie

Re: Seeking suggestion

2009-05-27 Thread Jamie Prescott


> From: Georg-Johann Lay 
> To: Jamie Prescott 
> Cc: Eric Botcazou ; gcc@gcc.gnu.org; Jim Wilson 
> ; Ian Lance Taylor 
> Sent: Wednesday, May 27, 2009 12:11:08 PM
> Subject: Re: Seeking suggestion
> 
> Jamie Prescott schrieb:
> 
> >>> Thanks for the explanation. I somehow thought that every insn spit out by 
> >>> a
> >>> define_insn was automatically turned into a parallel.
> >> 
> >> That's true, the template of a define_insn is automatically wrapped up in 
> >> a 
> PARALLEL.  But your addsi3 is a define_expand and this works differently.
> > 
> > 
> > Oh, OK. Thanks. So in case of a define_expand I have to manually pack the
> > multiple insns into a parallel, if I want them to stick together, right?
> 
> You are running in circles... you original solution with two distinct insns 
> already did that (implicitely), as one insn expanded to a parallel 
> add+clobber 
> and the other to a plain addsi.

Yes, I know. The original solution works.
This sub-thread started from Michael response to another solution I proposed in
order to avoid doubling the instructions.


- Jamie

Forgetting return values

2009-05-28 Thread Jamie Prescott


What am I missing?
I have a simple:

static inline int set_prop(char const *path, char const *name,
  void const *data, int size)
{
int error;

asm volatile ("int\t11\n\t"
  : "=a0" (error): "a0" (path), "a1" (name), "a2" (data),
"a3" (size));

return error;
}

extern int calc(int);

int proc(int i)
{
int j = calc(i);

return set_prop(0, 0, &j, sizeof(int));
}

The aX classes maps to the rX registers, no problem for GCC in there.
The code above, compiled with GCC 4.4.0 and -O3 produces:


 _proc:
pushFP
mov SP,FP
sub SP,4,SP
call   _calc
mov 0,r0
mov 4,r3
add FP,-4,r2
mov r0,r1
;   APP
; 69 ".c" 1
int 11

;   NO_APP
add SP,4,SP
pop FP
ret

GCC forgets about the calc() return value, and passes an address (FP-4) that
has never been written into.
buf if I add a "memory" clobber to the asm inline, everything comes back to
normal:

_proc:
pushFP
mov SP,FP
sub SP,4,SP
call   _calc
str.wr0,FP[-4]
mov 0,r0
mov 4,r3
add FP,-4,r2
mov r0,r1
;   APP
; 69 ".c" 1
int 11

;   NO_APP
add SP,4,SP
pop FP
ret

Why is the memory clobber required, and why GCC does not understand to
sync the value to memory when passing the address to a function?
Thanks,


- Jamie

Re: Forgetting return values

2009-05-28 Thread Jamie Prescott


> From: Adam Nemet 
> To: Jamie Prescott 
> Cc: gcc@gcc.gnu.org
> Sent: Thursday, May 28, 2009 11:10:49 AM
> Subject: Re: Forgetting return values
> 
> Jamie Prescott writes:
> > static inline int set_prop(char const *path, char const *name,
> >   void const *data, int size)
> > {
> > int error;
> >
> > asm volatile ("int\t11\n\t"
> >   : "=a0" (error): "a0" (path), "a1" (name), "a2" 
> > (data),
> > "a3" (size));
> >
> > return error;
> > }
> >
> > extern int calc(int);
> >
> > int proc(int i)
> > {
> > int j = calc(i);
> >
> > return set_prop(0, 0, &j, sizeof(int));
> > }
> ...
> >
> > Why is the memory clobber required, and why GCC does not understand to
> > sync the value to memory when passing the address to a function?
> 
> Because you never inform GCC that you will use the value at
> address *NAME.  Try to use "m"(*name) rather than "a1"(name) in the asm.

That's 'data', not 'name'. But OK, got it. unfortunately, I cannot use "m" since
that value need to go into a specific register.
Any other solution?


- Jamie

Re: Forgetting return values

2009-05-28 Thread Jamie Prescott


> From: Andrew Haley 
> To: Jamie Prescott 
> Cc: Adam Nemet ; "gcc-h...@gcc.gnu.org" 
> 
> Sent: Thursday, May 28, 2009 11:48:30 AM
> Subject: Re: Forgetting return values
> 
> Jamie Prescott wrote:
> >> From: Adam Nemet 
> 
> >>> Why is the memory clobber required, and why GCC does not understand to
> >>> sync the value to memory when passing the address to a function?
> >> Because you never inform GCC that you will use the value at
> >> address *NAME.  Try to use "m"(*name) rather than "a1"(name) in the asm.
> > 
> > That's 'data', not 'name'. But OK, got it. unfortunately, I cannot use "m" 
> since
> > that value need to go into a specific register.
> 
> This is not appropriate for gcc@, which is for gcc development.

Sorry, I posted to gcc@ because I thought it was a problem with my TARGET.


- Jamie

43 matches

Mail list logo