There are 10 function units:
(1) 2 RISCs: the 2 RISC have the same capability and they can do load/store, full-word arithmetic/logic
operations, register move, ...
(2) 4 DSPs ( 2 MAC, 1 BSU, and 1 VFU):
* MAC: can do the multiply-accumulate, and SIMD arithmetic operations
* BSU: packing/unpacking, determine absolute value, average, ...
* VFU: packing/unpacking, swap, bit reverse, determine min/max, ...
(3) 4 CFUs (Customized Function Unit: do some MPEG4 decoding related operations):
* VLD CFU: MV/DC/AC decoder
* DCT/IDCT CFU: some instructions for DCT/IDCT * MC CFU: some instructions for motion compensation/estimation
* (didn't implemented yet)
There are 8 slots in a VLIW instruction bundle (i.e, can issue at most 8
instructions in 1 cycle),
and the assembly language syntax looks like:
"<op code> <function unit name> <dst>, <src1>, <src2>, ..."For example:
===================================[top]====================================
mov .risc0 r1, #25 \\
ldw .risc0 r2, [fp, #30] \\
addub .mac0 d0, d4, d3 \\
subub .mac1 d11, d7, d4
add .risc0 r3, r1, r5
===================================[end]====================================
(The symbol "\\" means "parallel". The next instruction will be issued at the
same cycle.)
The first 4 instructions are in the same VLIW bundle (issued in the first
cycle),
and the last one instruction is in other VLIW bundle (issued in the next cycle).I plan to schedule the instructions by the "pipeline description".
Currently I have three questions after I reading the Ch10 ~ 13 of GCC internals
manual:
(1) How can I output the parallel symbol "\\" in the final pass?
It's obvious that I should append the "\\" to some instructions
which are in the same bundle,
but I didn't find out the corresponding target machine
macros/hooks to do so.(2) How can I fill the <function unit> field?
(Could the questions, (1) and (2), be solved by using the macro PRINT_OPERAND? )
(3) Should I put only one machine instruction in each instruction
pattern?
In other platform portings, I saw there are more than 1 machine instructions
in the "output templates".
For example: "add\\t%Q0, %Q0, %Q2\;adc\\t%R0, %R0, %R2".
Some of the output templates will call a C function to output many
instructions which shouldn't have
the same characteristics in the function unit pipeline.
I'm worried that the "multi-instructions" output template will
confuse the DFA
and will casue many instructions in one of VLIW bundle slots.
Should I split them by define_split and design the corresponding
refined instruction patterns for them?