Re: [MIPS] MADD issue

2007-04-13 Thread Nigel Stephens



Richard Sandiford wrote:


As far as madd goes, I think it would be better to either
(a) get combine to handle this situation or (b) get expand
to generate a fused multiply-add from the outset.

(b) sounds like it might be useful in its own right.  At the moment we
treat the generation of floating-point multiply-adds as an optimisation,
but in some applications it's critical not to round the intermediate
result.  (I don't know if there's a bugzilla entry about this.)
If we treated fused multiply-add as a primitive operation, we could
extend it to integer types too.  In this case we'd also need to
handle widening multiplications, but we already need to do that
for stand-alone multiplications.
  


Richard

While I agree with you philosophically, it feels like (b) might be quite 
a major task. A number of optimisation passes which currently recognise 
and MUL and PLUS separately (e.g. loop strength reduction) would now 
need to be extended to handle the fused MULPLUS and MULSUB operators.


And although the reduction in instruction count due to your previous 
change is good, what is it as a percentage of the total? After all it 
only helps code which uses 64-bit integer types with a 32-bit ABI, which 
is probably quite a small proportion of most real-life applications -- 
whereas for some algorithms the ability to use MADD is absolutely 
critical to performance, and for them losing the ability to generate 
MADD is a significant backward step for the compiler.


How about, as a workaround until (b) sees the light of day, we 
reimplement adddi3 and subdi3 only (not the other di mode patterns), 
qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented 
more cleanly nowadays, using  define_insn_and_split and/or a "#" 
template, to avoid generating multi-instruction assembler sequences.


Nigel


Re: [MIPS] MADD issue

2007-04-13 Thread Nigel Stephens



Richard Sandiford wrote:

Nigel Stephens <[EMAIL PROTECTED]> writes:
  
While I agree with you philosophically, it feels like (b) might be quite 
a major task. A number of optimisation passes which currently recognise 
and MUL and PLUS separately (e.g. loop strength reduction) would now 
need to be extended to handle the fused MULPLUS and MULSUB operators.


And although the reduction in instruction count due to your previous 
change is good, what is it as a percentage of the total? After all it 
only helps code which uses 64-bit integer types with a 32-bit ABI, which 
is probably quite a small proportion of most real-life applications -- 
whereas for some algorithms the ability to use MADD is absolutely 
critical to performance, and for them losing the ability to generate 
MADD is a significant backward step for the compiler.


How about, as a workaround until (b) sees the light of day, we 
reimplement adddi3 and subdi3 only (not the other di mode patterns), 
qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented 
more cleanly nowadays, using  define_insn_and_split and/or a "#" 
template, to avoid generating multi-instruction assembler sequences.



The old patterns had a define_split too.  That wasn't really the problem.

If you don't want to add a tree code yet, it would still be possible to
add the optab and expand support, recognising mult-add sequences in a
similar way to how we recognise widening multiplies now.  I feel at
least that's a step in the right direction.
  


OK, we'll have a think about that.

Thanks

Nigel


Re: [MIPS] MADD issue

2007-04-20 Thread Nigel Stephens



Richard Sandiford wrote:

Nigel Stephens <[EMAIL PROTECTED]> writes:
  
I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, 
mcore, score, arm & pa backends still implement adddi3 as either a 
define_insn which outputs two instructions or an explicit define_expand 
followed define_split and associated sub patterns.  Are we setting the 
bar too high for MIPS? :)



I don't think that follows.  The main reason that ports like rs6000,
i386, arm, sparc and pa define adddi3 is because those architectures
provide special add-with-carry type instructions, or similar
architecture-specific optimisations.  


Right, good point.


MIPS has nothing like that.
  


Actually the MIPS DSP ASE does have addsc and addwc, which could be used 
for this purpose. Sadly not subsc and subwc, though.



The old MIPS patterns just re-implemented the standard optabs version
(and often did so less efficiently, as I said before).

  
Whilst I'm sure that your proposal is the right one going forward, it 
still feels like it could be significant amount of work to implement. 
And the simplified optab/expand support would only work for multiply-add 
sequences expressed within a single expression, and wouldn't be able to 
optimise disjoint multiply, then add expressions. Or have I missed 
something.



I don't think that follows either.  Out-of-ssa should coalesce them
if they are in the same basic block.  And if they aren't in the
same basic block, that's presumably because the tree optimisers
think they shouldn't be.  Combine wouldn't look across basic
block boundaries either AFAIK.

E.g. compiling:


typedef unsigned long long ull;
typedef unsigned int ui;

ull foo (ui x, ui y, ull z)
{
  ull tmp = (ull) x * y;
  return tmp + z;
}


with a recent snapshot and "-O2 -fdump-tree-all" shows that the
final_cleanup dump does indeed have the combined form:


;; Function foo (foo)

foo (x, y, z)
{
:
  return (long long unsigned int) y * (long long unsigned int) x + z;

}


  


Ah I see. Fair enough.


I realise no-one else has spoken out in support of me, so perhaps
I'm in a minority of one here.  But it does seem to me that in the
Tree-SSA world, it makes less sense to duplicate standard optabs
in the backend purely for the reason of keeping DImode arithmetic
around as DImode arithmetic for longer.

  
in the short term we really do need to reenable madd/msub for MIPS32 
targets in GCC 4. We could do that with a local patch to put back 
adddi3, but it would be better if we could coordinate this with you.



If removing the patterns had been purely a clean-up, I would be more
open to the idea of putting the patterns back.  But given that removing
the patterns had an optimisation benefit of its own, I'm less open to
the idea of adding them back, especially when (as far as I'm concerned)
there's a clean way of getting the best of both worlds.

  


OK, so maybe as the person who removed adddi3 from the MIPS backend, and 
the main  proponent of the new fused opcodes, you get to volunteer to 
implement this? :)


In the meantime the performance gain from being able to use a widening 
madd is more important to us than the benefit of improved optimisation 
of 64-bit addition, so we'll probably have to put adddi3 back in as a 
local patch.


Nigel



Re: [MIPS] MADD issue

2007-04-20 Thread Nigel Stephens



Richard Sandiford wrote:

Nigel Stephens <[EMAIL PROTECTED]> writes:
  
While I agree with you philosophically, it feels like (b) might be quite 
a major task. A number of optimisation passes which currently recognise 
and MUL and PLUS separately (e.g. loop strength reduction) would now 
need to be extended to handle the fused MULPLUS and MULSUB operators.


And although the reduction in instruction count due to your previous 
change is good, what is it as a percentage of the total? After all it 
only helps code which uses 64-bit integer types with a 32-bit ABI, which 
is probably quite a small proportion of most real-life applications -- 
whereas for some algorithms the ability to use MADD is absolutely 
critical to performance, and for them losing the ability to generate 
MADD is a significant backward step for the compiler.


How about, as a workaround until (b) sees the light of day, we 
reimplement adddi3 and subdi3 only (not the other di mode patterns), 
qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented 
more cleanly nowadays, using  define_insn_and_split and/or a "#" 
template, to avoid generating multi-instruction assembler sequences.



The old patterns had a define_split too.  That wasn't really the problem.

If you don't want to add a tree code yet, it would still be possible to
add the optab and expand support, recognising mult-add sequences in a
similar way to how we recognise widening multiplies now.  I feel at
least that's a step in the right direction.

  


Richard

I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, 
mcore, score, arm & pa backends still implement adddi3 as either a 
define_insn which outputs two instructions or an explicit define_expand 
followed define_split and associated sub patterns.  Are we setting the 
bar too high for MIPS? :)


Whilst I'm sure that your proposal is the right one going forward, it 
still feels like it could be significant amount of work to implement. 
And the simplified optab/expand support would only work for multiply-add 
sequences expressed within a single expression, and wouldn't be able to 
optimise disjoint multiply, then add expressions. Or have I missed 
something.


in the short term we really do need to reenable madd/msub for MIPS32 
targets in GCC 4. We could do that with a local patch to put back 
adddi3, but it would be better if we could coordinate this with you.


Nigel



Re: [MIPS64] Problems when Passing TImode Parameters in EABI

2006-10-05 Thread Nigel Stephens


Fu, Chao-Ying wrote:
> Hello,
>
>   I think there is a bug in mips_pass_by_reference when the mips abi
> is EABI to pass TImode parameters.
>
>   


>   The following code is from the mainline GCC "mips.c".
> -
> mips_pass_by_reference (CUMULATIVE_ARGS *cum ATTRIBUTE_UNUSED,
> enum machine_mode mode, tree type,
> bool named ATTRIBUTE_UNUSED)
> {
>   if (mips_abi == ABI_EABI)
> {
>   int size;
>
>   /* ??? How should SCmode be handled?  */
>   if (type == NULL_TREE || mode == DImode || mode == DFmode)
> return 0;
>
>   size = int_size_in_bytes (type);
>   return size == -1 || size > UNITS_PER_WORD;
> }
>   else
> {
>   /* If we have a variable-sized parameter, we have no choice.  */
>   return targetm.calls.must_pass_in_stack (mode, type);
> }
> }
> -
>   When "type" is NULL_TREE, this function returns "0".  But when "type" is 
> not NULL and
> the size of TImode is bigger than UNITS_PER_WORD, so it returns "1".
> This causes inconsistency in the following test.
>
> # cat timode.c
> typedef int TItype __attribute__ ((mode (TI)));
> TItype test1 (TItype a)
> {
>   return a*a;
> }
>
> # mipsisa64-elf-gcc -S timode.c -O3
>
> ### timiode.s  
> test1:
> daddiu  $sp,$sp,-8
> sd  $31,0($sp)
> ld  $3,8($4) <- Parameter in stack
> ld  $2,0($4) <- Parameter in stack
> move$7,$3  < Parameter in register
> move$4,$2  < Parameter in register
> move$5,$3  < Parameter in register
> jal __multi3
> move$6,$2  < Parameter in register
>
> ld  $31,0($sp)
> move$4,$2
> move$2,$4
> j   $31
> daddiu  $sp,$sp,8
>
>   Does someone know how to pass TImode parameters in EABI?  Thanks!
>   

Hi Chao-ying

MIPS ABIs do not usually document the behaviour of 128 bit scalars,
since such types are not defined by most language standards, and are
typically only available via compiler-specific extensions. So if TImode
arguments  work correctly on any ABI, then I would guess that it might
be more by luck than design! EABI is one of the more sketchily specified
MIPS ABIs -- the 64-bit variant probably even more so.

You could fix the mips.c back-end to handle TImode consistently for EABI
and post those changes to the appropriate list for comment. But I wonder
if you really need to use EABI, or are only using it because it's the
default for a mipsisa64-elf configuration. If the latter is true, then
you could try using the N32 ABI and see if that works better for you
(i.e. use the --with-abi=n32 argument when configuring your toolchain).
FWIW our own gcc-3.4 mips-sde-elf configurations default to using N32
for 64-bit ISAs, though we've not yet tested that on gcc-4.

Nigel


Re: Fixed-point branch?

2007-05-29 Thread Nigel Stephens



Mark Mitchell wrote:

Chao-Ying, I'm also interested in whether or not these changes have any
impact on C++.  With your changes, does GNU C++ now accept any
fixed-point constructs? 


Chao-ying's on vacation this week. AFAIK Chao-ying's code does nothing explicit 
to support, or not support, C++. The front-end code was based on the Decimal 
floating point extension, so should currently behave in a similar fashion to 
that with regards C++.



 If so, are you aware of any effort to
standardize any of this functionality in C++?  


Annex F (page 96) of the N1169 spec 
(http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf) does give some 
"guidelines" for C++ usage, but only a very hand-wavy couple of paragraphs. I 
guess we can ask the committee members if there is any work going on in this area.



(I think that my
preference, in the short term, would be to disable this functionality in
C++ -- although, of course, we will eventually want to turn it on so
that GNU C++ is as much as possible a superset of GNU C.)

  


OK

Nigel