I plan to up-port my ST20 port to the mainline. I thought I'd first give a description of the port and ask for some opinion. Basically, I would like to hear, would an implementation along these lines be acceptable in principle, and otherwise, what are the main no-go points ?
I'll try to keep it short. ST20 CPU (a transputer derivative) is a (register-)stack architecture and has no general registers. Our ports too use memory cells from the stack frame to allocate pseudos. This makes a lot of sense on ST20, as the first 16 words of the stack frame have special properties and are a scarce resource, and reloading these memory cells from other memory cells actually makes the code smaller and faster. The first port (by Fabio Riccardi, against 2.95.2) was done in a common way -- mapping pseudos to such "memory registers", then trying to save the situation in the MACHINE_DEPENDENT_REORG pass by combining several insns into one -- as long as it evaluates on the 3-level-deep regstack, then eliminating the memory registers becoming unused. While maintaining the port to pass the test suite, I had to constantly extend the machine-dep-reorg pass which started to seriously resemble to the combine pass in the core. Besides, I felt the port lacked flexibility, as one couldn't freely use 'define_expand's without modifying also 'machine-reorg'. Thirdly, combining insns after reload was frequently yielding suboptimal use of memory registers. The current port, done from scratch, tried to fix the above and employ the 'combine' pass to its maximum, instead of duplicating the code: -- the .md grammar was extended to describe separate stack operations making up an insn (like, load first word on the stack, load second word on the stack, subtract, store the result. I named the operation 'define_nonterm'.). Old-style 'define_insn's are fully supported and are usually intermixed with the new one in the .md file, as some insns won't use the regstack. -- generated 'recog_insn' is essentially a BURM labeler (I modified the algorithm from [1] to account for a finite stack depth and hacked 'genrecog' to generate it.) -- RTL expansion uses mostly 'define_expand' from .md. -- 'legitimate_address_p' has the usual sense before 'combine'. -- 'combine' combines insns into one iff the combination can be BURM- labeled by 'recog_insn'. -- past 'combine', 'legitimate_address_p' says OK to any RTX that evaluates on the regstack of depth 3, whatever the form the address parts of it might have. -- at the point where no new insns are emitted and their relative order won't change anymore (after reload?), the labeled insns are BURM-reduced back to separate stack operations. This time, regstack registers are explicitly mentioned in the RTX, and the .md must have a 'define_insn' matching each stack op. Splitting insns becomes possible again after this point. -- RTX_COST returns the BURM cost of the RTL expression. The downside: -- the RTX structure had to be extended, to store the BURM label data. This is per-port and may be configured in 'config.sub' via a corresponding compilation option. In my implementation, 12 to 16 results are bytes were added. I didn't attempt to compress this data as proposed in [1]. -- due to the way the substitution in 'combine' works, insns will frequently lose their BURM label data, thus requiring relabeling (the result of recognition is less cacheable than before). This probably could be improved to some extent. (I followed the posts on GCC's speed and memory consumption, and am very well aware of the implications this would have. Please just bear in mind that currently ST20 developers simply have no alternative to the only compiler there is. If somebody knows a less costly algorithm for BURM- labeling RTX, please please let me know.) -- 'extract_operand' and some other routines expecting operands at fixed places in an insn had to be made more intelligent, so other 'genxxx' generators had to be modified too. The results: -- actually, rather encouraging. The quality of the emitted code is largely comparable to the one from the manufacturer's compiler, sometimes better, and there still is a small margin for improvement. [1] C.W.Fraser, D.R.Hanson, T.A. Proebsting, Engineering a simple, efficient code-generator generator. LOPLAS Vol 1, issue 3 (Sep. 1992) Thanks in advance, Dimitri