Design a microcontroller for gcc
Hello, I'm currently considering writing my own microcontroller to use on a FPGA. Since I'd like to be able to use C to write "non-timing-critical" parts of my code, I thought I'd include a gcc port as part of the design considerations. I've read the online manual about gcc backend and googled to find comments about gcc for microcontroller and I found that thread particulary interesting : http://gcc.gnu.org/ml/gcc/2003-03/msg01402.html Here is a small description of the type of microcontroller I'm thinking of : * 8 bit RISC microcontroller * 16 general purpose registers * 16 level deep hardware call stack * no special instruction or regs for the stack * load/store to a flat "memory" (minimum 1k, up to 16k) (addressing in that memory uses a register for the 8 lsb and a special register for the msbs) that memory can be loaded along with the code to have known content at startup for eg. * classic add/sub/... have 3 registers operand (dest,src1,src2) * no hardware mul, nor shift-by-n (have to be done in software) * pipelined with some restriction on instruction scheduling (cfr later) * a special "I/O" space (gcc shouldn't care, we should only access it in asm) * relative short/long call/branch, absolute short/long call/branch. (The long branch/jump are achieved by preloading a special GPR with the MSBs to use for the next jump) * 2 flags Carry & Zero for testing. (note these are not random choices, I know how I can implement that in a FPGA quite efficiently ...) My first question would be : "Do you see anything that's missing that would be a great plus ?" I mentionned earlier that there is some scheduling restriction on the instructions due to internal pipelining. For example, the result of a fetch from memory may not be used in the instruction directly following the fetch. When there is a conditionnal branch, the instruction just following the branch will always be executed, no matter what the result is (branch/call/... are not immediate but have a 1 instruction latency beforce the occur). Is theses kind of limitation 'easily' supported by gcc ? I saw several time that gcc works better with a lot of GPRs. I could increase them to 32 but then arithmetic instructions would have to use the same register for destination than for src1. eg. for the add instruction, I'm planning : add rD, rA, rB ; rD = rA + rB add rD, rD, imm8; rD = rD + immediate_8bits but if I allow 32 registers, my opcode is too short so I have to limit : add rD, rD, rA ; rD = rD + rA add rD, rD, imm8; rD = rD + immediate_8bits I'm more in favor of only 16 registers because : - Some uc have even less and have working gcc - 16 is already a good number - 16 uses less space in the FPGA ;p - I'd rather have the possibility to store result elsewhere than have 32 registers. But maybe I'm wrong ... Any comments, thoughts, ... are welcomed ! Sylvain
Re: Design a microcontroller for gcc
DJ Delorie wrote: >> * 8 bit RISC microcontroller > > Not 16? Well, at first it was to save space in the FPGA (basically, the regs and ALU takes twice the space) and because many 16 bits ops can be done with 8 bits regs and lots of the code I do can live with that. But I had a quick glance at the diagrams I drawed and with the modification I should do to supports 16 bits pointers with 8 bits regs, the hardware "tricks" might end up costing me more than the doubled reg banks and ALU. Another reason is speed ... I'd like to run that thing at 133 Mhz in a low cost spartan 3 ... (2 cycles per instructions so 66MIPS at the end) and a 16 bits carry chain is ... well ... longer ;) I'll do some timing test to see if it's viable. The final reason are immediates. I can't have 16 bits immediates in my opecode, there is just not the room ... (opcodes are 18 bits, the width of a classical ram-block in my FPGA technology). Do you have a 16 bits uc that fits gcc really well that I could use as a guide for the instruction set ? Using 16 bits regs, do I have to support 8 bits operations on them ? >> * 16 level deep hardware call stack > > If you have RAM, why not use it? Model calls like the PPC - put > current $pc in a register and jump. The caller saves the old $pc in > the regular stack. GCC is going to want a "normal" frame. This is > easy to do in hardware, and more flexible than a hardware call stack. Well that solution isn't that easy with 8 bits regs and the 16 deep stack is the same a a single register in the FPGAs I use. But now, with 16 bits regs, it might simplify this and that may become a better solution. >> * load/store to a flat "memory" (minimum 1k, up to 16k) >>(addressing in that memory uses a register for the 8 lsb and >> a special register for the msbs) >>that memory can be loaded along with the code to have known >>content at startup for eg. > > GCC is going to want register pairs to work as larger registers. > Like, if you have $r2 and $r3 as 8 bit registers, gcc wants [$r2$r3] > to be usable as a 16 bit register. Another reason to go with 16 bit > registers ;-) > > GCC won't like having an address split across "special" registers. > > But it's OK to limit index registers to evenly numbered ones. So If I use 16 bits registers, do I have to handle pairs of them to form 32 bits ? 16 bits regs keeps sounding better and better ... I know HDL better than gcc so if it can ease the gcc port at the cost of a slightly bigger/more complex HDL design, I'm willing to make the trade-off. >> * 2 flags Carry & Zero for testing. > > GCC will want 4 (add sign and overflow) to support signed comparisons. > Sign should be easy; overflow is the carry out of bit 6. Shouldn't be much of a problem to add that. >>I mentionned earlier that there is some scheduling restriction on the >>instructions due to internal pipelining. For example, the result of a >>fetch from memory may not be used in the instruction directly following >>the fetch. When there is a conditionnal branch, the instruction just >>following the branch will always be executed, no matter what the result >>is (branch/call/... are not immediate but have a 1 instruction latency >>beforce the occur). Is theses kind of limitation 'easily' supported by >>gcc ? > > Delay slots are common; gcc handles them well. You might need to add > custom code to enforce the pipeline rules if your pipeline won't > automatically stall. Yes, making the pipeline stall isn't easy in my case and detecting those case will costs me logic and decrease my timing margin. The hardware detects that the preceding instruction writes in a register read by the current instructions and forward the results. But for memory fetch (and io access), there is just no way, the results appears too late ... Sylvain
Re: Design a microcontroller for gcc
DJ Delorie wrote: >>So If I use 16 bits registers, do I have to handle pairs of them to form >>32 bits ? > > > Well, you don't *have* to if your word size is only 16 bits. GCC will > still pair them, but you'll need to tell gcc how to split them back up > for the opcodes you have available. > > Note that there are some operations that gcc assumes you have 32-bit > opcodes for, though. Or at least insns that emulate it. Like what ? > You really want to have native support for sizeof(int) values and > sizeof(void *) values, bigger things can be emulated or broken up. Ok, then so it will be ;) Any way, thanks for all the advices, I'll try to come up with a good instruction set (both in regards to effecient implementation and effecient for running code) Sylvain
Re: Design a microcontroller for gcc
Hans-Peter Nilsson wrote: > On Wed, 15 Feb 2006, Sylvain Munaut wrote: > >> * 2 flags Carry & Zero for testing. > > > I think most of your questions have been answered, so let me > just add that if nothing else, the port will be much simplified > if you make sure that only specific compare instructions set > condition codes, i.e. not as a nice side-effect of move, add and > sub - or at least make such condition-code side-effects > optional. It depends on too many undisclosed details like > pipeline restrictions to say whether performance is generally > better or worse, but I can tell for sure that the GCC port will > be simpler with a specific set of condition-code setting insns. Making it optionnal is not hard nor expensive in hardware, the problem is that my opcodes need to be 18 bits and I won't have space to stuff another option bit ... What I was thinking for the moment was to have : - sign is always the msb of the last ALU output - add/sub to modify all flags - move/xor/and/not/or only affect zero (and sign) - shift operations always affect carry and zero - Have some specific instructions like compare and test, but theses would only operate on registers (and not on immediate) What's so bad about have the flag as side-effects ? Here it's a simple MCU, it doesn't have a very long pipeline and that pipeline is 'almost' invisible to the end-user exception for memory fetch and io/access ... > BTW, it depends on the compare (and branch) instructions whether > just two flags are sufficient. g Adding Sign and overflow is pretty easy. And the compare instruction/logic path shouldn't be a problem either. MIPS has no flag ??? how does branching work ? Finally, about immediates, I'm thinking of having instruction like add could have 4 different forms : add rD, rA, rB add rA, rA, imm add rA, rA, imm<<8 add rA, rA, signextend(imm) Is that kind of manipulation on the immediate well understood by gcc internals ? Or maybe just allow immediates in the mov but that seems like a big penalty ... Sylvain
Re: Design a microcontroller for gcc
Hans-Peter Nilsson wrote: > On Wed, 15 Feb 2006, DJ Delorie wrote: > I wrote: > >>>Anyway, at least keep a way to add reg+reg and reg+integer, load and >>>store of memory and load of integer and address without condition >>>code effects and your port has a chance to avoid the related bloat. >> >>At least, move/load/store shouldn't touch flags. > > > I may have listed more operations than necessary, but the > important bit is that you need to form arbitrary addresses in > the stack frame without touching flags. If for any const_int N, > (plus reg N) is a valid address for moves to and from memory > that doesn't touch flags, then I suppose you don't *need* an > "add" that doesn't touch flags. Move/Load/Store without flag is no problem. But for add, to allow multiword add, carry is needed and I can't make it optionnal. But as you said, I could make the load/store take 3 args, either load rD, rB(rA) or load rD, imm4(rA) with imm4 being between -16 and 15. Another thing for memory. I can't make 8 bits access, the memory is 16 bits wide and I can't change that, so 8 bits access will have to be done in sw. Also, I could make the address given a word address or a byte address (but then I would just drop the LSB since i don't support unaligned access ... and the immediate in load/store would be each even between -32 and 30). DJ Delorie wrote: > You have to think about what kind of constants are going to be common > in your software, and plan accordingly. I can see several types of immediates: 1) Complete arbitrary constants like filter coefficients, stuff like that. 2) Small positive/negative integers: like to increment or walk in a array 3) Single bits or grouped bits anywhere in the word (to set/test bits) 4) Power of N - 1 : To do modulo / masking. For the class 1, not much to do about it ... Those will have to be loaded with several operations ... To handle class 2/3/4 in the operation taking an immediate (and that are not mov), I was thinking of allowing a 4 bits immediate, that could be placed in any nibble, and the nibbles on the left could either be filled with 1 or 0, and the nibbles on the right could also be filled with 1 or 0 (independently). So for ex 0x003f would be possible (3 in second nibble, 0-filled on the left and 1 filled on the right). 0xfff1 also, but not 0x0370 for example ... Now the problem is to well describe to gcc what can be taken as an immediate and what can't ... Anyway, thanks for youe advices ! I may not be able to follow all of them because I also have hw constraint but I do appreciate them. It may get sometime before I actually come to start the work on gcc (first I have to actually do the hw and port binutils ;) Sylvain
Re: Design a microcontroller for gcc
Hans-Peter Nilsson wrote: > On Thu, 16 Feb 2006, Sylvain Munaut wrote: > >>Move/Load/Store without flag is no problem. But for add, to allow >>multiword add, carry is needed and I can't make it optionnal. > > > As I hinted, perhaps you can have the multiword carry a separate > one from the flags carry, perhaps moved over with a separate > instruction? Yes, two set of flags look good. One for arith operations (shift thru carry, multiword add, ...) and the other only for compare operations. With either a bit to select which set of flag to use during conditionnal jumps or a way to copy one set into the other (I'll see what's the easier to implement, but i'd like the first possibility better). >>Also, I could make the address given a word address or a byte address >>(but then I would just drop the LSB since i don't support unaligned >>access ... and the immediate in load/store would be each even between >>-32 and 30). > > Stick with byte addresses. Really, really really. Word > addresses used to be somehow supported, but there are many bugs > and no other working port does it. Having the imm4 be bits 5..1 > and bit 0 constant 0 is certainly the right thing to do for > 16-bit-wide accesses. Ok, so be it then. Sylvain