[Bug target/55295] [SH] Add support for fipr instruction

2013-03-04 Thread turkeyman at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295



Manu Evans  changed:



   What|Removed |Added



 CC||turkeyman at gmail dot com



--- Comment #2 from Manu Evans  2013-03-04 16:22:29 
UTC ---

+1



I'm seeing the same pattern.

Infact, I'm noticing a lot of my maths code seems to be performing a lot of

redundant moves.



Are there actually any builtins/intrinsics available for the SH4?

How do I access the awesome vector operations without breaking out the inline

asm?



It would be nice to have some intrinsics that understand vectors as sequences

of 4 float regs, and automate a sequential (vector) load.



Also, the ftrv opcode doesn't seem to be accessible either.


[Bug target/55295] [SH] Add support for fipr instruction

2013-03-04 Thread turkeyman at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295



--- Comment #4 from Manu Evans  2013-03-05 01:55:08 
UTC ---

(In reply to comment #3)

> (In reply to comment #2)

> > +1

> > 

> > I'm seeing the same pattern.

> > Infact, I'm noticing a lot of my maths code seems to be performing a lot of

> > redundant moves.

> 

> Some examples would be great regarding this matter, although I can already

> imagine what the code looks like.  One of the problems is the auto-inc-dec 
> pass

> (see PR 50749).  A long time ago the rule of thumb for SH4 programmers was

> "read float values with post-inc addressing in your C code, and write float

> values with pre-dec addressing".  This does not work anymore, since all memory

> accesses are turned into array like index based addresses internally in the

> compiler.  Then the auto-inc-dec RTL pass is supposed to find post-inc and

> pre-dec addressing mode opportunities, but it fails to do so in most cases.

> I have started writing a replacement RTL pass that would try to optimize

> addressing mode selections.  I hope to get it in for GCC 4.9.

> 

> Anyway, if you have some example code that you can share, it would be really

> appreciated and helpful during development for testing purposes.

> 

> > Are there actually any builtins/intrinsics available for the SH4?

> > How do I access the awesome vector operations without breaking out the 
> > inline

> > asm?

> 

> There aren't that many HW vector ops on SH4, just fipr and ftrv.  At the

> moment, there are no builtins for those, so you'd have to use inline asm

> intrinsics.  Like I mentioned in comment #1, I'd rather make the compiler

> figure out opportunities from portable generic code.  Although for ftrv the

> patterns might be a bit  complicated, also because the compiler then has 
> to

> manage the 2nd FPU regs bank...

>

> > It would be nice to have some intrinsics that understand vectors as 
> > sequences

> > of 4 float regs, and automate a sequential (vector) load.

> 

> That would be the job of the address-mode-selection RTL pass.  It would also

> improve overall code quality on SH.  The fastest way to load 4 float vectors 
> is

> to use 2x fmov.d.  The compiler could also do that automatically, but this

> requires FPSCR switching, which unfortunately also needs some rework (e.g. see

> PR 53513, PR 6526).

> 

> And on top of that, we also have PR 13423.  It seems that the proper fix for

> this is a new reworked (vector) ABI for SH.



Well I hope you find the time for all this, the (small) sh4 community will love

you! :)



Why is a new ABI important?





> > Also, the ftrv opcode doesn't seem to be accessible either.

> 

> True.  I really hope that I'll find enough time to brush up SH FPU code

> generation for GCC 4.9.  Until then, I'd suggest to use inline-asm style

> intrinsics.



4.9? That sounds like it could be years off... :(



I'm not sure what you mean by 'inline-asm style intrinsics'?

Last time I used inline-asm blocks in GCC it totally broke the optimisation. It

wouldn't reorder across inline-asm blocks, and it couldn't eliminate any

redundant load/stores appearing within the block in the event the value was

already resident.



Can you give me a small demonstration of what you mean?

I found whenever I touch inline-asm, the block just grows and grows in scope

upwards until my whole tight routine is written in asm... but that was some

years back, GCC3 era.





I'll report examples here as I find compelling situations.



But on a tangent, can you explain this behaviour? It's really ruining my code:



float testfunc(float v, float v2)

{

return v*v2 + v;

}



Compiled with: -O3 -mfused-madd



testfunc:

.LFB1:

.cfi_startproc

mov.l.L3,r1  ;

lds.l@r1+,fpscr  ; <- why does it mess with fpscr?

add#-4,r1

fmovfr5,fr0

add#4,r1   ; <- +4 after -4... redundant?

fmacfr0,fr4,fr0

rts

lds.l@r1+,fpscr

.L4:

.align 2

.L3:

.long__fpscr_values

.cfi_endproc



There's a lot of rubbish in there... I expect:



testfunc:

.LFB1:

.cfi_startproc

fmovfr5,fr0

fmacfr0,fr4,fr0

rts

.cfi_endproc





I'm also noticing that -ffast-math is inhibiting fmac emission in some cases:



Compiled with: -O3 -mfused-madd -ffast-math



testfunc:

.LFB1:

.cfi_startproc

mov.l.L3,r1

lds.l@r1+,fpscr

fldi1fr0 ; what is a 1.0 doing here?

add#-4,r1

add#4,r1

faddfr4,fr0 ; v+1 ??

fmulfr5,fr0 ; (v+1)*v2 ?? That's not what the code does...

rts

lds.l@r1+,fpscr



What's going on there? That doesn't even look correct...



Cheers!


[Bug target/55295] [SH] Add support for fipr instruction

2013-03-05 Thread turkeyman at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295



--- Comment #6 from Manu Evans  2013-03-05 12:53:26 
UTC ---

Awesome, thanks for the info and help!



Strange -m4-single won't work with my toolchain, it says 'not compatible with

this configuration' >_<



Looking forward to all these fixes! :)


[Bug target/56592] [SH] Add vector ABI

2013-03-14 Thread turkeyman at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56592



Manu Evans  changed:



   What|Removed |Added



 CC||turkeyman at gmail dot com



--- Comment #1 from Manu Evans  2013-03-14 09:48:17 
UTC ---

I watch with keen anticipation! :)


[Bug target/55295] [SH] Add support for fipr instruction

2015-03-01 Thread turkeyman at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295

--- Comment #12 from Manu Evans  ---
Hey, I'm still following this with great interest.

Is it possible to make an intrinsic for this instruction so it can be issued at
will?

What I'm still more interested in at this point, would be some support for
passing vectors in registers, making it possible to eliminate so much of that
fmov noise.