There are 10 function units:
(1) 2 RISCs: the 2 RISC have the same capability and they can do load/store, full-word arithmetic/logic
operations, register move, ...
(2) 4 DSPs ( 2 MAC, 1 BSU, and 1 VFU):
* MAC: can do the multiply-accumulate, and SIMD arithmetic operations
* BSU: packing/unpacking, determine absolute value, average, ...
* VFU: packing/unpacking, swap, bit reverse, determine min/max, ...
(3) 4 CFUs (Customized Function Unit: do some MPEG4 decoding related operations):
* VLD CFU: MV/DC/AC decoder
* DCT/IDCT CFU: some instructions for DCT/IDCT * MC CFU: some instructions for motion compensation/estimation
* (didn't implemented yet)
There are 8 slots in a VLIW instruction bundle (i.e, can issue at most 8 instructions in 1 cycle), and the assembly language syntax looks like: "<op code> <function unit name> <dst>, <src1>, <src2>, ..."
For example: ===================================[top]==================================== mov .risc0 r1, #25 \\ ldw .risc0 r2, [fp, #30] \\ addub .mac0 d0, d4, d3 \\ subub .mac1 d11, d7, d4 add .risc0 r3, r1, r5 ===================================[end]==================================== (The symbol "\\" means "parallel". The next instruction will be issued at the same cycle.) The first 4 instructions are in the same VLIW bundle (issued in the first cycle), and the last one instruction is in other VLIW bundle (issued in the next cycle).
I plan to schedule the instructions by the "pipeline description". Currently I have three questions after I reading the Ch10 ~ 13 of GCC internals manual: (1) How can I output the parallel symbol "\\" in the final pass? It's obvious that I should append the "\\" to some instructions which are in the same bundle, but I didn't find out the corresponding target machine macros/hooks to do so.
(2) How can I fill the <function unit> field?
(Could the questions, (1) and (2), be solved by using the macro PRINT_OPERAND? )
(3) Should I put only one machine instruction in each instruction pattern? In other platform portings, I saw there are more than 1 machine instructions in the "output templates". For example: "add\\t%Q0, %Q0, %Q2\;adc\\t%R0, %R0, %R2". Some of the output templates will call a C function to output many instructions which shouldn't have the same characteristics in the function unit pipeline. I'm worried that the "multi-instructions" output template will confuse the DFA and will casue many instructions in one of VLIW bundle slots. Should I split them by define_split and design the corresponding refined instruction patterns for them?