Machine description and code generation

2014-10-29 Thread Mathias Roslund
Hello,

I'm considering attempting a 65816 target but decided it would be a good
idea to start with something simple in order to learn how GCC generate
code. So I created a minimal machine description with just two
instructions (plus the mandatory nop/jump/etc):

(define_mode_iterator INT [QI HI SI])

(define_insn "mov"
  [(set (match_operand:INT 0 "nonimmediate_operand" "=mr")
(match_operand:INT 1 "general_operand" "mri"))
  ]
  ""
  "move\t%0,%1")

(define_insn "add3"
  [(set (match_operand:INT 0 "nonimmediate_operand" "=mr")
(plus:INT (match_operand:INT 1 "general_operand" "mri")
 (match_operand:INT 2 "general_operand" "mri")))
  ]
  ""
  "add\t%0,%1,%2")

As you can see, all operands may be memory, or registers.

I then created some simple tests and had a look at the generated code:

C:
int data[4];
data[0] = data[1] + data[2];
ASM:
mover0,#data;# 6movsi
add r1,r0,#4;# 7addsi3
add r0,r0,#8;# 8addsi3
add data,(r1),(r0)  ;# 9addsi3

C:
int data[4];
data[1] = data[2] + data[3];
ASM:
add data+4,data+8,data+12   ;# 9addsi3

This is the point where I got really confused. Why does GCC split the add
and generate 4 opcodes in the first example? Changing the optimization
level doesn't make a difference. I've tried to set the indirect register
address cost much higher than symbol_ref/label_ref/const in an atempt to
make GCC avoid those indirect loads, but it didn't help. I've tried it
with GCC 4.8.3 and 4.9.1, they both produce similar, but not identical,
results.

The result is equally strange for a simple move:

C:
data[1] = data[0];
ASM:
movedata+4,data ;# 8movsi

C:
data[0] = data[1];
ASM:
add r0,#data,#4 ;# 7addsi3
movedata,(r0)   ;# 8movsi

What's so special about the first entry in an array that causes this? I
have a feeling I'm missing something obvious here :-)

Any ideas?

Best regards,

Mathias Roslund



RE: Machine description and code generation

2014-11-26 Thread Mathias Roslund
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
> Jeff Law
> Sent: Wednesday, October 29, 2014 9:36 PM
> To: Mathias Roslund; gcc@gcc.gnu.org
> Subject: Re: Machine description and code generation
> 
> On 10/29/14 07:40, Mathias Roslund wrote:
> > Hello,
> >
> > I'm considering attempting a 65816 target but decided it would be a
> > good idea to start with something simple in order to learn how GCC
> > generate code. So I created a minimal machine description with just
> > two instructions (plus the mandatory nop/jump/etc):
> That's ambitious, primarily because something like the 65816's register
file
> does not map well to the types of register files GCC typically works with.
You
> may end up having to do something similar to the virtual registers on the
rl78
> port.

That is/was the plan. I actually took the easy route a while back and create
a MIPS to 65816 recompiler. Works quite well. It wasn't until I started
messing around with tracking register usage and sizes in order to improve
the recompiler that I figured it would be better to attempt a 65816 target
for GCC instead. So I'm basically treating the 65816 as if it were a weird
MIPS like CPU with few instructions, a lot of registers, byte/word
operations and memory to memory operations.

> > What's so special about the first entry in an array that causes this?
> > I have a feeling I'm missing something obvious here :-)
> The best thing to do is to start looking at the various debugging dumps.
>   Probably the most interesting would be the .expand dump which shows
> things as you move from the higher level GIMPLE IL into the initial RTL
IL.  The
> GIMPLE IL will look at lot like C code and should have a form similar to
the
> original input.  The RTL will roughly correspond to instructions.
> 
>  From there you'll either work forward or backwards into gimple or the RTL
> codepaths depending on what your initial analysis shows.

Thanks for the hint. I added the "-da" when compiling and option and had a
look at the output. Was quite interesting and confusing :-) Turned out my
issues were related to costs after all. After implementing TARGET_RTX_COSTS
things started to make a lot more sense.

Since then I've added more instructions and gotten to the point where most
stuff seems to be working. My current issue is that signed divide and all
shift operations insists on sign/zero extending the operands, resulting in
32bit operations even when only 8bit ones would be required. Interestingly,
multiplies and unsigned divides behave as desired, i.e. only operate on the
required number of bits.

Example:

unsigned char data[3];

data[2] = data[0] / data[1];

Is expanded as:

unsigned char _2;
unsigned char _3;
unsigned char _4;
_2 = data[0];
_3 = data[1];
_4 = _2 / _3;
data[2] = _4;

While:

signed char data[3];

data[2] = data[0] / data[1];

Is expanded as:

signed char _4;
int _5;
signed char _6;
int _7;
int _8;
signed char _9;
_4 = data[0];
_5 = (int) _4;
_6 = data[1];
_7 = (int) _6;
_8 = _5 / _7;
_9 = (signed char) _8;
data[2] = _9;

And I really have no idea why the sign extension/truncation does take place
or how to avoid it.

Best regards,

Mathias Roslund




RE: Machine description and code generation

2014-11-27 Thread Mathias Roslund
> From: Joern Rennecke [mailto:joern.renne...@embecosm.com]
> Sent: Wednesday, November 26, 2014 6:13 PM
> To: Mathias Roslund
> Cc: GCC
> Subject: Re: Machine description and code generation
> 
> On 26 November 2014 at 16:48, Mathias Roslund
>  wrote:
> > Since then I've added more instructions and gotten to the point where
> > most stuff seems to be working. My current issue is that signed divide
> > and all shift operations insists on sign/zero extending the operands,
> > resulting in 32bit operations even when only 8bit ones would be required.
> 
> That's a matter of the input language.  In C, you get default promotions to 
> int
> from narrower integer types.

Thanks. I had totally forgotten about the C promotion rules.

> > Interestingly,
> > multiplies and unsigned divides behave as desired, i.e. only operate
> > on the required number of bits.
> 
> That's because the results are the same regardless, and the optimizers are
> able to undo the effects of the type promotions.

But isn't the result of an 8bit signed divide the same as the result of a 32bit 
signed divide when both operands are in the 8bit range? That is, shouldn't the 
optimizers be able to do the same for signed divide as well as shift operations?

I noticed I can revert the promotion effects using these instruction patterns:

  [(set (match_operand:SI 0 "nonimmediate_operand" "=mr")
(div:SI (sign_extend:SI (match_operand:QI 1 "general_operand" "mri"))
(sign_extend:SI (match_operand:QI 2 "general_operand" "mri"
  ]

  [(set (match_operand:QI 0 "nonimmediate_operand" "=mr")
(subreg:QI (div:SI (sign_extend:SI (match_operand:QI 1 
"general_operand" "mri"))
   (sign_extend:SI (match_operand:QI 2 
"general_operand" "mri"))) 0))
  ]

The first pattern will allow me to perform the sign extension after the divide 
(which will be orders of magnitude faster) while the second will remove the 
need for any sign extensions all together if the result is truncated to 8bit 
when stored.

It doesn't feel right to put this in the machine description though.

Best regards,

Mathias Roslund