from:"Sylvain Munaut"

Design a microcontroller for gcc

2006-02-14 Thread Sylvain Munaut

Hello,


I'm currently considering writing my own microcontroller to use on
a FPGA. Since I'd like to be able to use C to write
"non-timing-critical" parts of my code, I thought I'd include a gcc port
as part of the design considerations.


I've read the online manual about gcc backend and googled to find
comments about gcc for microcontroller and I found that thread
particulary interesting :

http://gcc.gnu.org/ml/gcc/2003-03/msg01402.html


Here is a small description of the type of microcontroller I'm thinking of :
* 8 bit RISC microcontroller
* 16 general purpose registers
* 16 level deep hardware call stack
* no special instruction or regs for the stack
* load/store to a flat "memory" (minimum 1k, up to 16k)
  (addressing in that memory uses a register for the 8 lsb and
   a special register for the msbs)
  that memory can be loaded along with the code to have known
  content at startup for eg.
* classic add/sub/... have 3 registers operand (dest,src1,src2)
* no hardware mul, nor shift-by-n (have to be done in software)
* pipelined with some restriction on instruction scheduling
  (cfr later)
* a special "I/O" space (gcc shouldn't care, we should only
  access it in asm)
* relative short/long call/branch, absolute short/long
  call/branch. (The long branch/jump are achieved by preloading
  a special GPR with the MSBs to use for the next jump)
* 2 flags Carry & Zero for testing.



(note these are not random choices, I know how I can implement that in a
FPGA quite efficiently ...)

My first question would be : "Do you see anything that's missing that
would be a great plus ?"

I mentionned earlier that there is some scheduling restriction on the
instructions due to internal pipelining. For example, the result of a
fetch from memory may not be used in the instruction directly following
the fetch. When there is a conditionnal branch, the instruction just
following the branch will always be executed, no matter what the result
is (branch/call/... are not immediate but have a 1 instruction latency
beforce the occur). Is theses kind of limitation 'easily' supported by
gcc ?

I saw several time that gcc works better with a lot of GPRs. I could
increase them to 32 but then arithmetic instructions would have to use
the same register for destination than for src1.

eg. for the add instruction,  I'm planning :

add rD, rA, rB  ; rD = rA + rB
add rD, rD, imm8; rD = rD + immediate_8bits

but if I allow 32 registers, my opcode is too short so I have to limit :

add rD, rD, rA  ; rD = rD + rA
add rD, rD, imm8; rD = rD + immediate_8bits

I'm more in favor of only 16 registers because :
 - Some uc have even less and have working gcc
 - 16 is already a good number
 - 16 uses less space in the FPGA ;p
 - I'd rather have the possibility to store result elsewhere than have
32 registers.

But maybe I'm wrong ...



Any comments, thoughts, ... are welcomed !



Sylvain

Re: Design a microcontroller for gcc

2006-02-14 Thread Sylvain Munaut

DJ Delorie wrote:
>>  * 8 bit RISC microcontroller
> 
> Not 16?

Well, at first it was to save space in the FPGA (basically, the regs and
ALU takes twice the space) and because many 16 bits ops can be done with
8 bits regs and lots of the code I do can live with that.

But I had a quick glance at the diagrams I drawed and with the
modification I should do to supports 16 bits pointers with 8 bits regs,
the hardware "tricks" might end up costing me more than the doubled reg
banks and ALU.

Another reason is speed ... I'd like to run that thing at 133 Mhz in a
low cost spartan 3 ... (2 cycles per instructions so 66MIPS at the end)
and a 16 bits carry chain is ... well ... longer ;) I'll do some timing
test to see if it's viable.

The final reason are immediates. I can't have 16 bits immediates in my
opecode, there is just not the room ... (opcodes are 18 bits, the width
of a classical ram-block in my FPGA technology). Do you have a 16 bits
uc that fits gcc really well that I could use as a guide for the
instruction set ?

Using 16 bits regs, do I have to support 8 bits operations on them ?

>>  * 16 level deep hardware call stack
> 
> If you have RAM, why not use it?  Model calls like the PPC - put
> current $pc in a register and jump.  The caller saves the old $pc in
> the regular stack.  GCC is going to want a "normal" frame.  This is
> easy to do in hardware, and more flexible than a hardware call stack.

Well that solution isn't that easy with 8 bits regs and the 16 deep
stack is the same a a single register in the FPGAs I use. But now, with
16 bits regs, it might simplify this and that may become a better solution.

>>  * load/store to a flat "memory" (minimum 1k, up to 16k)
>>(addressing in that memory uses a register for the 8 lsb and
>>   a special register for the msbs)
>>that memory can be loaded along with the code to have known
>>content at startup for eg.
> 
> GCC is going to want register pairs to work as larger registers.
> Like, if you have $r2 and $r3 as 8 bit registers, gcc wants [$r2$r3]
> to be usable as a 16 bit register.  Another reason to go with 16 bit
> registers ;-)
> 
> GCC won't like having an address split across "special" registers.
> 
> But it's OK to limit index registers to evenly numbered ones.

So If I use 16 bits registers, do I have to handle pairs of them to form
32 bits ?

16 bits regs keeps sounding better and better ... I know HDL better than
gcc so if it can ease the gcc port at the cost of a slightly bigger/more
complex HDL design, I'm willing to make the trade-off.

>>  * 2 flags Carry & Zero for testing.
> 
> GCC will want 4 (add sign and overflow) to support signed comparisons.
> Sign should be easy; overflow is the carry out of bit 6.

Shouldn't be much of a problem to add that.

>>I mentionned earlier that there is some scheduling restriction on the
>>instructions due to internal pipelining. For example, the result of a
>>fetch from memory may not be used in the instruction directly following
>>the fetch. When there is a conditionnal branch, the instruction just
>>following the branch will always be executed, no matter what the result
>>is (branch/call/... are not immediate but have a 1 instruction latency
>>beforce the occur). Is theses kind of limitation 'easily' supported by
>>gcc ?
> 
> Delay slots are common; gcc handles them well.  You might need to add
> custom code to enforce the pipeline rules if your pipeline won't
> automatically stall.

Yes, making the pipeline stall isn't easy in my case and detecting those
case will costs me logic and decrease my timing margin. The hardware
detects that the preceding instruction writes in a register read by the
current instructions and forward the results. But for memory fetch (and
io access), there is just no way, the results appears too late ...

Sylvain

Re: Design a microcontroller for gcc

2006-02-15 Thread Sylvain Munaut

DJ Delorie wrote:
>>So If I use 16 bits registers, do I have to handle pairs of them to form
>>32 bits ?
> 
> 
> Well, you don't *have* to if your word size is only 16 bits.  GCC will
> still pair them, but you'll need to tell gcc how to split them back up
> for the opcodes you have available.
> 
> Note that there are some operations that gcc assumes you have 32-bit
> opcodes for, though.  Or at least insns that emulate it.

Like what ?


> You really want to have native support for sizeof(int) values and
> sizeof(void *) values, bigger things can be emulated or broken up.

Ok, then so it will be ;)


Any way, thanks for all the advices, I'll try to come up with a
good instruction set (both in regards to effecient implementation and
effecient for running code)



Sylvain

Re: Design a microcontroller for gcc

2006-02-15 Thread Sylvain Munaut

Hans-Peter Nilsson wrote:
> On Wed, 15 Feb 2006, Sylvain Munaut wrote:
> 
>>  * 2 flags Carry & Zero for testing.
> 
> 
> I think most of your questions have been answered, so let me
> just add that if nothing else, the port will be much simplified
> if you make sure that only specific compare instructions set
> condition codes, i.e. not as a nice side-effect of move, add and
> sub - or at least make such condition-code side-effects
> optional.  It depends on too many undisclosed details like
> pipeline restrictions to say whether performance is generally
> better or worse, but I can tell for sure that the GCC port will
> be simpler with a specific set of condition-code setting insns.

Making it optionnal is not hard nor expensive in hardware, the problem
is that my opcodes need to be 18 bits and I won't have space to stuff
another option bit ...

What I was thinking for the moment was to have :
 - sign is always the msb of the last ALU output
 - add/sub to modify all flags
 - move/xor/and/not/or only affect zero (and sign)
 - shift operations always affect carry and zero
 - Have some specific instructions like compare and test, but theses
   would only operate on registers (and not on immediate)

What's so bad about have the flag as side-effects ?

Here it's a simple MCU, it doesn't have a very long pipeline and that
pipeline is 'almost' invisible to the end-user exception for memory
fetch and io/access ...


> BTW, it depends on the compare (and branch) instructions whether
> just two flags are sufficient.
g
Adding Sign and overflow is pretty easy. And the compare
instruction/logic path shouldn't be a problem either.

MIPS has no flag ??? how does branching work ?



Finally, about immediates, I'm thinking of having instruction like add
could have 4 different forms :
add rD, rA, rB
add rA, rA, imm
add rA, rA, imm<<8
add rA, rA, signextend(imm)

Is that kind of manipulation on the immediate well understood by gcc
internals ?

Or maybe just allow immediates in the mov but that seems like a big
penalty ...


Sylvain

Re: Design a microcontroller for gcc

2006-02-16 Thread Sylvain Munaut

Hans-Peter Nilsson wrote:
> On Wed, 15 Feb 2006, DJ Delorie wrote:
>   I wrote:
> 
>>>Anyway, at least keep a way to add reg+reg and reg+integer, load and
>>>store of memory and load of integer and address without condition
>>>code effects and your port has a chance to avoid the related bloat.
>>
>>At least, move/load/store shouldn't touch flags.
> 
> 
> I may have listed more operations than necessary, but the
> important bit is that you need to form arbitrary addresses in
> the stack frame without touching flags.  If for any const_int N,
> (plus reg N) is a valid address for moves to and from memory
> that doesn't touch flags, then I suppose you don't *need* an
> "add" that doesn't touch flags.

Move/Load/Store without flag is no problem. But for add, to allow
multiword add, carry is needed and I can't make it optionnal.

But as you said, I could make the load/store take 3 args, either
load rD, rB(rA)
or
load rD, imm4(rA)

with imm4 being between -16 and 15.

Another thing for memory. I can't make 8 bits access, the memory is 16
bits wide and I can't change that, so 8 bits access will have to be done
in sw.
Also, I could make the address given a word address or a byte address
(but then I would just drop the LSB since i don't support unaligned
access ... and the immediate in load/store would be each even between
-32 and 30).

DJ Delorie wrote:
> You have to think about what kind of constants are going to be common
> in your software, and plan accordingly.

I can see several types of immediates:
1) Complete arbitrary constants like filter coefficients, stuff like that.
2) Small positive/negative integers: like to increment or walk in a array
3) Single bits or grouped bits anywhere in the word (to set/test bits)
4) Power of N - 1 : To do modulo / masking.

For the class 1, not much to do about it ... Those will have to be
loaded with several operations ...
To handle class 2/3/4 in the operation taking an immediate (and that are
not mov), I was thinking of allowing a 4 bits immediate, that could be
placed in any nibble, and the nibbles on the left could either be filled
with 1 or 0, and the nibbles on the right could also be filled with 1 or
0 (independently).

So for ex 0x003f would be possible (3 in second nibble, 0-filled on the
left and 1 filled on the right). 0xfff1 also, but not 0x0370 for example ...

Now the problem is to well describe to gcc what can be taken as an
immediate and what can't ...

Anyway, thanks for youe advices ! I may not be able to follow all of
them because I also have hw constraint but I do appreciate them. It may
get sometime before I actually come to start the work on gcc (first I
have to actually do the hw and port binutils ;)

Sylvain

Re: Design a microcontroller for gcc

2006-02-20 Thread Sylvain Munaut

Hans-Peter Nilsson wrote:
> On Thu, 16 Feb 2006, Sylvain Munaut wrote:
> 
>>Move/Load/Store without flag is no problem. But for add, to allow
>>multiword add, carry is needed and I can't make it optionnal.
> 
> 
> As I hinted, perhaps you can have the multiword carry a separate
> one from the flags carry, perhaps moved over with a separate
> instruction?

Yes, two set of flags look good. One for arith operations (shift thru
carry, multiword add, ...) and the other only for compare operations.

With either a bit to select which set of flag to use during conditionnal
jumps or a way to copy one set into the other (I'll see what's the
easier to implement, but i'd like the first possibility better).


>>Also, I could make the address given a word address or a byte address
>>(but then I would just drop the LSB since i don't support unaligned
>>access ... and the immediate in load/store would be each even between
>>-32 and 30).
>
> Stick with byte addresses.  Really, really really.  Word
> addresses used to be somehow supported, but there are many bugs
> and no other working port does it.  Having the imm4 be bits 5..1
> and bit 0 constant 0 is certainly the right thing to do for
> 16-bit-wide accesses.

Ok, so be it then.


Sylvain

Design a microcontroller for gcc

Re: Design a microcontroller for gcc

Re: Design a microcontroller for gcc

Re: Design a microcontroller for gcc

Re: Design a microcontroller for gcc

Re: Design a microcontroller for gcc

6 matches

Site Navigation

Mail list logo

Footer information