http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295



--- Comment #5 from Oleg Endo <olegendo at gcc dot gnu.org> 2013-03-05 12:28:22 
UTC ---

(In reply to comment #4)

> 

> Why is a new ABI important?

> 



Because currently, there is no way to pass something like



struct { float x, y, z, w };



as function arguments in registers, although the default SH ABI could allow

passing up to 3 of such vectors.  The same applies to



typedef float v4sf __attribute__ ((vector_size (16)));



or 



std::array<float, 4>



However, code that does that will be incompatible with existing calling

conventions etc, thus a new (additional and optional) ABI.



> 4.9? That sounds like it could be years off... :(



4.8 is about to be released soon.  4.9 should follow at around the same time

next year.  Of course you can still grab the current development version and

use it anytime.



> 

> I'm not sure what you mean by 'inline-asm style intrinsics'?



Something like:



static inline void* get_gbr (void) throw ()

{

  void* retval;

  __asm__ volatile ("stc gbr, %0" : "=r" (retval) : );

  return retval;

}







> Last time I used inline-asm blocks in GCC it totally broke the optimisation. 
> It

> wouldn't reorder across inline-asm blocks, and it couldn't eliminate any

> redundant load/stores appearing within the block in the event the value was

> already resident.

> 

> Can you give me a small demonstration of what you mean?

> I found whenever I touch inline-asm, the block just grows and grows in scope

> upwards until my whole tight routine is written in asm... but that was some

> years back, GCC3 era.

> 



Yes, there are some limits of what the compiler can do with an asm block.  It

won't analyze the contents of the asm block, only the placeholders.  Thus it

usually can't eliminate redundant loads/stores.





> 

> I'll report examples here as I find compelling situations.

> 

> But on a tangent, can you explain this behaviour? It's really ruining my code:

> 

> float testfunc(float v, float v2)

> {

>     return v*v2 + v;

> }

> 

> Compiled with: -O3 -mfused-madd

> 

> testfunc:

> .LFB1:

>     .cfi_startproc

>     mov.l    .L3,r1      ;

>     lds.l    @r1+,fpscr  ; <- why does it mess with fpscr?

>     add    #-4,r1

>     fmov    fr5,fr0

>     add    #4,r1       ; <- +4 after -4... redundant?

>     fmac    fr0,fr4,fr0

>     rts    

>     lds.l    @r1+,fpscr

> .L4:

>     .align 2

> .L3:

>     .long    __fpscr_values

>     .cfi_endproc

> 

> There's a lot of rubbish in there... I expect:

> 

> testfunc:

> .LFB1:

>     .cfi_startproc

>     fmov    fr5,fr0

>     fmac    fr0,fr4,fr0

>     rts    

>     .cfi_endproc

> 



The fpscr value is changed because its default setting is to operate on

double-precision float values.  This is the default configuration of the

compiler.  You can change it by using e.g. -m4-single, which will assume that

FPSCR setting is configured for single-precision at function entry/return.



The +4 -4 thing is a known problem and stems from the fact that the FPSCR

load/store insns are available only as post-inc/pre-dec.



> 

> I'm also noticing that -ffast-math is inhibiting fmac emission in some cases:

> 

> Compiled with: -O3 -mfused-madd -ffast-math

> 

> testfunc:

> .LFB1:

>     .cfi_startproc

>     mov.l    .L3,r1

>     lds.l    @r1+,fpscr

>     fldi1    fr0         ; what is a 1.0 doing here?

>     add    #-4,r1

>     add    #4,r1

>     fadd    fr4,fr0     ; v+1 ??

>     fmul    fr5,fr0     ; (v+1)*v2 ?? That's not what the code does...

>     rts    

>     lds.l    @r1+,fpscr

> 

> What's going on there? That doesn't even look correct...



The transformation is legitimate, although unlucky, since using fmac would be

better in this case.



The original expression 'v*v2 + v' is converted to '(1 + v2)*v' and that's what

the code does.  Probably you compiled for little endian and got confused by the

floating point register ordering for arguments.  It goes like ...

fr5 = arg 0

fr4 = arg 1

fr7 = arg 2

fr6 = arg 3

...



This is another reason for adding a new ABI, BTW.

Reply via email to