Machine description and code generation
Hello, I'm considering attempting a 65816 target but decided it would be a good idea to start with something simple in order to learn how GCC generate code. So I created a minimal machine description with just two instructions (plus the mandatory nop/jump/etc): (define_mode_iterator INT [QI HI SI]) (define_insn "mov" [(set (match_operand:INT 0 "nonimmediate_operand" "=mr") (match_operand:INT 1 "general_operand" "mri")) ] "" "move\t%0,%1") (define_insn "add3" [(set (match_operand:INT 0 "nonimmediate_operand" "=mr") (plus:INT (match_operand:INT 1 "general_operand" "mri") (match_operand:INT 2 "general_operand" "mri"))) ] "" "add\t%0,%1,%2") As you can see, all operands may be memory, or registers. I then created some simple tests and had a look at the generated code: C: int data[4]; data[0] = data[1] + data[2]; ASM: mover0,#data;# 6movsi add r1,r0,#4;# 7addsi3 add r0,r0,#8;# 8addsi3 add data,(r1),(r0) ;# 9addsi3 C: int data[4]; data[1] = data[2] + data[3]; ASM: add data+4,data+8,data+12 ;# 9addsi3 This is the point where I got really confused. Why does GCC split the add and generate 4 opcodes in the first example? Changing the optimization level doesn't make a difference. I've tried to set the indirect register address cost much higher than symbol_ref/label_ref/const in an atempt to make GCC avoid those indirect loads, but it didn't help. I've tried it with GCC 4.8.3 and 4.9.1, they both produce similar, but not identical, results. The result is equally strange for a simple move: C: data[1] = data[0]; ASM: movedata+4,data ;# 8movsi C: data[0] = data[1]; ASM: add r0,#data,#4 ;# 7addsi3 movedata,(r0) ;# 8movsi What's so special about the first entry in an array that causes this? I have a feeling I'm missing something obvious here :-) Any ideas? Best regards, Mathias Roslund
RE: Machine description and code generation
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of > Jeff Law > Sent: Wednesday, October 29, 2014 9:36 PM > To: Mathias Roslund; gcc@gcc.gnu.org > Subject: Re: Machine description and code generation > > On 10/29/14 07:40, Mathias Roslund wrote: > > Hello, > > > > I'm considering attempting a 65816 target but decided it would be a > > good idea to start with something simple in order to learn how GCC > > generate code. So I created a minimal machine description with just > > two instructions (plus the mandatory nop/jump/etc): > That's ambitious, primarily because something like the 65816's register file > does not map well to the types of register files GCC typically works with. You > may end up having to do something similar to the virtual registers on the rl78 > port. That is/was the plan. I actually took the easy route a while back and create a MIPS to 65816 recompiler. Works quite well. It wasn't until I started messing around with tracking register usage and sizes in order to improve the recompiler that I figured it would be better to attempt a 65816 target for GCC instead. So I'm basically treating the 65816 as if it were a weird MIPS like CPU with few instructions, a lot of registers, byte/word operations and memory to memory operations. > > What's so special about the first entry in an array that causes this? > > I have a feeling I'm missing something obvious here :-) > The best thing to do is to start looking at the various debugging dumps. > Probably the most interesting would be the .expand dump which shows > things as you move from the higher level GIMPLE IL into the initial RTL IL. The > GIMPLE IL will look at lot like C code and should have a form similar to the > original input. The RTL will roughly correspond to instructions. > > From there you'll either work forward or backwards into gimple or the RTL > codepaths depending on what your initial analysis shows. Thanks for the hint. I added the "-da" when compiling and option and had a look at the output. Was quite interesting and confusing :-) Turned out my issues were related to costs after all. After implementing TARGET_RTX_COSTS things started to make a lot more sense. Since then I've added more instructions and gotten to the point where most stuff seems to be working. My current issue is that signed divide and all shift operations insists on sign/zero extending the operands, resulting in 32bit operations even when only 8bit ones would be required. Interestingly, multiplies and unsigned divides behave as desired, i.e. only operate on the required number of bits. Example: unsigned char data[3]; data[2] = data[0] / data[1]; Is expanded as: unsigned char _2; unsigned char _3; unsigned char _4; _2 = data[0]; _3 = data[1]; _4 = _2 / _3; data[2] = _4; While: signed char data[3]; data[2] = data[0] / data[1]; Is expanded as: signed char _4; int _5; signed char _6; int _7; int _8; signed char _9; _4 = data[0]; _5 = (int) _4; _6 = data[1]; _7 = (int) _6; _8 = _5 / _7; _9 = (signed char) _8; data[2] = _9; And I really have no idea why the sign extension/truncation does take place or how to avoid it. Best regards, Mathias Roslund
RE: Machine description and code generation
> From: Joern Rennecke [mailto:joern.renne...@embecosm.com] > Sent: Wednesday, November 26, 2014 6:13 PM > To: Mathias Roslund > Cc: GCC > Subject: Re: Machine description and code generation > > On 26 November 2014 at 16:48, Mathias Roslund > wrote: > > Since then I've added more instructions and gotten to the point where > > most stuff seems to be working. My current issue is that signed divide > > and all shift operations insists on sign/zero extending the operands, > > resulting in 32bit operations even when only 8bit ones would be required. > > That's a matter of the input language. In C, you get default promotions to > int > from narrower integer types. Thanks. I had totally forgotten about the C promotion rules. > > Interestingly, > > multiplies and unsigned divides behave as desired, i.e. only operate > > on the required number of bits. > > That's because the results are the same regardless, and the optimizers are > able to undo the effects of the type promotions. But isn't the result of an 8bit signed divide the same as the result of a 32bit signed divide when both operands are in the 8bit range? That is, shouldn't the optimizers be able to do the same for signed divide as well as shift operations? I noticed I can revert the promotion effects using these instruction patterns: [(set (match_operand:SI 0 "nonimmediate_operand" "=mr") (div:SI (sign_extend:SI (match_operand:QI 1 "general_operand" "mri")) (sign_extend:SI (match_operand:QI 2 "general_operand" "mri" ] [(set (match_operand:QI 0 "nonimmediate_operand" "=mr") (subreg:QI (div:SI (sign_extend:SI (match_operand:QI 1 "general_operand" "mri")) (sign_extend:SI (match_operand:QI 2 "general_operand" "mri"))) 0)) ] The first pattern will allow me to perform the sign extension after the divide (which will be orders of magnitude faster) while the second will remove the need for any sign extensions all together if the result is truncated to 8bit when stored. It doesn't feel right to put this in the machine description though. Best regards, Mathias Roslund