Re: [MIPS] MADD issue
Richard Sandiford wrote: As far as madd goes, I think it would be better to either (a) get combine to handle this situation or (b) get expand to generate a fused multiply-add from the outset. (b) sounds like it might be useful in its own right. At the moment we treat the generation of floating-point multiply-adds as an optimisation, but in some applications it's critical not to round the intermediate result. (I don't know if there's a bugzilla entry about this.) If we treated fused multiply-add as a primitive operation, we could extend it to integer types too. In this case we'd also need to handle widening multiplications, but we already need to do that for stand-alone multiplications. Richard While I agree with you philosophically, it feels like (b) might be quite a major task. A number of optimisation passes which currently recognise and MUL and PLUS separately (e.g. loop strength reduction) would now need to be extended to handle the fused MULPLUS and MULSUB operators. And although the reduction in instruction count due to your previous change is good, what is it as a percentage of the total? After all it only helps code which uses 64-bit integer types with a 32-bit ABI, which is probably quite a small proportion of most real-life applications -- whereas for some algorithms the ability to use MADD is absolutely critical to performance, and for them losing the ability to generate MADD is a significant backward step for the compiler. How about, as a workaround until (b) sees the light of day, we reimplement adddi3 and subdi3 only (not the other di mode patterns), qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented more cleanly nowadays, using define_insn_and_split and/or a "#" template, to avoid generating multi-instruction assembler sequences. Nigel
Re: [MIPS] MADD issue
Richard Sandiford wrote: Nigel Stephens <[EMAIL PROTECTED]> writes: While I agree with you philosophically, it feels like (b) might be quite a major task. A number of optimisation passes which currently recognise and MUL and PLUS separately (e.g. loop strength reduction) would now need to be extended to handle the fused MULPLUS and MULSUB operators. And although the reduction in instruction count due to your previous change is good, what is it as a percentage of the total? After all it only helps code which uses 64-bit integer types with a 32-bit ABI, which is probably quite a small proportion of most real-life applications -- whereas for some algorithms the ability to use MADD is absolutely critical to performance, and for them losing the ability to generate MADD is a significant backward step for the compiler. How about, as a workaround until (b) sees the light of day, we reimplement adddi3 and subdi3 only (not the other di mode patterns), qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented more cleanly nowadays, using define_insn_and_split and/or a "#" template, to avoid generating multi-instruction assembler sequences. The old patterns had a define_split too. That wasn't really the problem. If you don't want to add a tree code yet, it would still be possible to add the optab and expand support, recognising mult-add sequences in a similar way to how we recognise widening multiplies now. I feel at least that's a step in the right direction. OK, we'll have a think about that. Thanks Nigel
Re: [MIPS] MADD issue
Richard Sandiford wrote: Nigel Stephens <[EMAIL PROTECTED]> writes: I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, mcore, score, arm & pa backends still implement adddi3 as either a define_insn which outputs two instructions or an explicit define_expand followed define_split and associated sub patterns. Are we setting the bar too high for MIPS? :) I don't think that follows. The main reason that ports like rs6000, i386, arm, sparc and pa define adddi3 is because those architectures provide special add-with-carry type instructions, or similar architecture-specific optimisations. Right, good point. MIPS has nothing like that. Actually the MIPS DSP ASE does have addsc and addwc, which could be used for this purpose. Sadly not subsc and subwc, though. The old MIPS patterns just re-implemented the standard optabs version (and often did so less efficiently, as I said before). Whilst I'm sure that your proposal is the right one going forward, it still feels like it could be significant amount of work to implement. And the simplified optab/expand support would only work for multiply-add sequences expressed within a single expression, and wouldn't be able to optimise disjoint multiply, then add expressions. Or have I missed something. I don't think that follows either. Out-of-ssa should coalesce them if they are in the same basic block. And if they aren't in the same basic block, that's presumably because the tree optimisers think they shouldn't be. Combine wouldn't look across basic block boundaries either AFAIK. E.g. compiling: typedef unsigned long long ull; typedef unsigned int ui; ull foo (ui x, ui y, ull z) { ull tmp = (ull) x * y; return tmp + z; } with a recent snapshot and "-O2 -fdump-tree-all" shows that the final_cleanup dump does indeed have the combined form: ;; Function foo (foo) foo (x, y, z) { : return (long long unsigned int) y * (long long unsigned int) x + z; } Ah I see. Fair enough. I realise no-one else has spoken out in support of me, so perhaps I'm in a minority of one here. But it does seem to me that in the Tree-SSA world, it makes less sense to duplicate standard optabs in the backend purely for the reason of keeping DImode arithmetic around as DImode arithmetic for longer. in the short term we really do need to reenable madd/msub for MIPS32 targets in GCC 4. We could do that with a local patch to put back adddi3, but it would be better if we could coordinate this with you. If removing the patterns had been purely a clean-up, I would be more open to the idea of putting the patterns back. But given that removing the patterns had an optimisation benefit of its own, I'm less open to the idea of adding them back, especially when (as far as I'm concerned) there's a clean way of getting the best of both worlds. OK, so maybe as the person who removed adddi3 from the MIPS backend, and the main proponent of the new fused opcodes, you get to volunteer to implement this? :) In the meantime the performance gain from being able to use a widening madd is more important to us than the benefit of improved optimisation of 64-bit addition, so we'll probably have to put adddi3 back in as a local patch. Nigel
Re: [MIPS] MADD issue
Richard Sandiford wrote: Nigel Stephens <[EMAIL PROTECTED]> writes: While I agree with you philosophically, it feels like (b) might be quite a major task. A number of optimisation passes which currently recognise and MUL and PLUS separately (e.g. loop strength reduction) would now need to be extended to handle the fused MULPLUS and MULSUB operators. And although the reduction in instruction count due to your previous change is good, what is it as a percentage of the total? After all it only helps code which uses 64-bit integer types with a 32-bit ABI, which is probably quite a small proportion of most real-life applications -- whereas for some algorithms the ability to use MADD is absolutely critical to performance, and for them losing the ability to generate MADD is a significant backward step for the compiler. How about, as a workaround until (b) sees the light of day, we reimplement adddi3 and subdi3 only (not the other di mode patterns), qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented more cleanly nowadays, using define_insn_and_split and/or a "#" template, to avoid generating multi-instruction assembler sequences. The old patterns had a define_split too. That wasn't really the problem. If you don't want to add a tree code yet, it would still be possible to add the optab and expand support, recognising mult-add sequences in a similar way to how we recognise widening multiplies now. I feel at least that's a step in the right direction. Richard I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, mcore, score, arm & pa backends still implement adddi3 as either a define_insn which outputs two instructions or an explicit define_expand followed define_split and associated sub patterns. Are we setting the bar too high for MIPS? :) Whilst I'm sure that your proposal is the right one going forward, it still feels like it could be significant amount of work to implement. And the simplified optab/expand support would only work for multiply-add sequences expressed within a single expression, and wouldn't be able to optimise disjoint multiply, then add expressions. Or have I missed something. in the short term we really do need to reenable madd/msub for MIPS32 targets in GCC 4. We could do that with a local patch to put back adddi3, but it would be better if we could coordinate this with you. Nigel
Re: [MIPS64] Problems when Passing TImode Parameters in EABI
Fu, Chao-Ying wrote: > Hello, > > I think there is a bug in mips_pass_by_reference when the mips abi > is EABI to pass TImode parameters. > > > The following code is from the mainline GCC "mips.c". > - > mips_pass_by_reference (CUMULATIVE_ARGS *cum ATTRIBUTE_UNUSED, > enum machine_mode mode, tree type, > bool named ATTRIBUTE_UNUSED) > { > if (mips_abi == ABI_EABI) > { > int size; > > /* ??? How should SCmode be handled? */ > if (type == NULL_TREE || mode == DImode || mode == DFmode) > return 0; > > size = int_size_in_bytes (type); > return size == -1 || size > UNITS_PER_WORD; > } > else > { > /* If we have a variable-sized parameter, we have no choice. */ > return targetm.calls.must_pass_in_stack (mode, type); > } > } > - > When "type" is NULL_TREE, this function returns "0". But when "type" is > not NULL and > the size of TImode is bigger than UNITS_PER_WORD, so it returns "1". > This causes inconsistency in the following test. > > # cat timode.c > typedef int TItype __attribute__ ((mode (TI))); > TItype test1 (TItype a) > { > return a*a; > } > > # mipsisa64-elf-gcc -S timode.c -O3 > > ### timiode.s > test1: > daddiu $sp,$sp,-8 > sd $31,0($sp) > ld $3,8($4) <- Parameter in stack > ld $2,0($4) <- Parameter in stack > move$7,$3 < Parameter in register > move$4,$2 < Parameter in register > move$5,$3 < Parameter in register > jal __multi3 > move$6,$2 < Parameter in register > > ld $31,0($sp) > move$4,$2 > move$2,$4 > j $31 > daddiu $sp,$sp,8 > > Does someone know how to pass TImode parameters in EABI? Thanks! > Hi Chao-ying MIPS ABIs do not usually document the behaviour of 128 bit scalars, since such types are not defined by most language standards, and are typically only available via compiler-specific extensions. So if TImode arguments work correctly on any ABI, then I would guess that it might be more by luck than design! EABI is one of the more sketchily specified MIPS ABIs -- the 64-bit variant probably even more so. You could fix the mips.c back-end to handle TImode consistently for EABI and post those changes to the appropriate list for comment. But I wonder if you really need to use EABI, or are only using it because it's the default for a mipsisa64-elf configuration. If the latter is true, then you could try using the N32 ABI and see if that works better for you (i.e. use the --with-abi=n32 argument when configuring your toolchain). FWIW our own gcc-3.4 mips-sde-elf configurations default to using N32 for 64-bit ISAs, though we've not yet tested that on gcc-4. Nigel
Re: Fixed-point branch?
Mark Mitchell wrote: Chao-Ying, I'm also interested in whether or not these changes have any impact on C++. With your changes, does GNU C++ now accept any fixed-point constructs? Chao-ying's on vacation this week. AFAIK Chao-ying's code does nothing explicit to support, or not support, C++. The front-end code was based on the Decimal floating point extension, so should currently behave in a similar fashion to that with regards C++. If so, are you aware of any effort to standardize any of this functionality in C++? Annex F (page 96) of the N1169 spec (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf) does give some "guidelines" for C++ usage, but only a very hand-wavy couple of paragraphs. I guess we can ask the committee members if there is any work going on in this area. (I think that my preference, in the short term, would be to disable this functionality in C++ -- although, of course, we will eventually want to turn it on so that GNU C++ is as much as possible a superset of GNU C.) OK Nigel