date:20160119

Re: Instruction scheduling for the R5900's 2 integer pipelines

2016-01-19 Thread Jeff Law


On 01/19/2016 05:04 AM, Woon yung Liu wrote:

Hi,




I'm am trying to complete support for the MIPS R5900, by adding support for its 
second
interger multiplication/division pipe. GCC currently supports only the first 
one.My target at this moment is the public GCC v5.3.0 release.



To get the 2nd pipeline supported, I've added the hi1 and lo1 registers to GCC, 
as well as
constraints for them (wr for hi1lo1 and wl for lo1). The existing instructions 
in mips.md have
been modified to use the new constraints as new alternatives.

A new constraint modifier was added too, which will append a 1 to the 
instruction (i.e.
changes mult to mult1) if it detects that the specified operand is for pipeline 
1 instead of 0.



The 2nd pipeline is utilized by using different instructions (i.e. mult1 
instead of mult, as
mult is for the 1st pipeline) and registers (i.e. lo1 and hi1, instead of lo 
and hi).

Right now, I know that it is possible for GCC to output the new instructions 
for the 2nd
pipeline if I manipulate the MD constrains for instructions like mult... but 
GCC doesn't seem
to be ever using the 2nd pipeline on its own otherwise.


I originally believed that it's because I didn't add in a pipeline description 
into my MD file
(5900.md), but nothing seemed to have changed after I did that.

I followed the documentation on the pipeline description, but I realized that I 
still don't
understand how the automatron will tell GCC which alternative (and hence which 
integer pipe)
to use and so I don't think think there's a relationship between the automatron 
and the two
different sets of multiplication/divisions instructions yet.


Could somebody please advice me on how to get this going? Or at least, tell me 
which other
target has two integer pipelines that are used in this way, so that I will have 
something to
reference to?

AFAIK, no other MIPS processors have this 2nd pipeline design as the R5900.
There was a time where GCC would generate mult/multiply-add instructions 
that would issue into the 2nd R5900 pipeline.


It's been 15+ years since I looked at that problem, but IIRC I twiddled 
the old register allocator, along with the expected changes in the 
pipeline and constraints for the mul/mul-add insns in the mips backend 
to exploit the dual multiply pipes on the r5900.


The key was to realize that because selection of the pipeline is static 
based on the registers used, you have to look at this as a register 
allocation problem.


You might dig out the old Cygnus releases.  They may provide clues, 
particularly on the register allocation tweak.


Jeff

Re: Implementing TI mode (128-bit) and the 2nd pipeline for the MIPS R5900

2016-01-19 Thread Jeff Law


On 01/19/2016 04:59 AM, Woon yung Liu wrote:


In my current attempt at adding support for the TI mode, the MMI
definitions are added into a MD file for the R5900 and some functions
(i.e. mips_output_move) were modified to allow certain moves for the
TI mode of the R5900 target. However, while it seems like TI-mode
integers can now be passed between functions and used with the MMI
(within one 128-bit GPR), GCC still treats 128-bit moves as complex
moves (split across two 64-bit registers); its built-in functions
expect both $a0 and $a1 to be used if the first argument is a 128-bit
value. To return a 128-bit value, both $v0 and $v1 are used.
You'll have to adjust FUNCTION_ARG and its counterpart for return values 
to describe how to pass these 128 bit values around.





Otherwise, I believe that there are two solutions to the problem with
the calling convention (but again, I have no idea which is better):
1. Keep the target as 64-bit. Support for MMI will be either
compromised (i.e. made to assemble and split the 128-bit vectors upon
entry/exit) or totally omitted. Perhaps omission would be best so
that there will never be a compromise in performance.




2. Promote the word size of the R5900 to 128-bit. I think that SONY
might have done this, as the code from their late games used lq/sq
(quard-word load/store) to preserve registers. However, I think that
this goes against the existing ABIs, doesn't it? Plus the MMI
instruction set is proprietary and isn't used in any other MIPS.

Changing the word size to 128 bit should not be necessary.

Many ports define patterns for operations on data types that are larger 
than their native word mode.


You really need to add the new patterns for operating on 128bit values 
to the machine description and adjust the parameter passing routines .


We did have to force the compiler to assume a 64bit *host* datatype 
(long long).  I don't recall the reasoning behind that.




If I carry on with my current design, I suppose that I need to make
it so that the hi1/lo1 registers are never used for other MIPS
targets. I didn't find a RTL constraint that meant something like
"nothing", so I made the new constraints define MD1_REGS (hi1/lo1) as
their MD_REGS (hi/lo) equivalents if the target is not the R5900
(much like the DSP ACC register constraint, ka). But unlike the DSP
ACC register constraint (ka), my constraints are used as alternatives
alongside whatever (i.e. x for hi/lo or ka for hi/lo/acc) that was
originally there. Would this be acceptable, given that there will be
two similar alternatives for some instructions when the target is not
the R5900?
You define the registers & constraints normally.  However, you make the 
registers conditional on the target in use.  ie, if you're not on an 
r5900 target, then mark those registers as fixed.  That will prevent the 
compiler from trying to use them on things other than the r5900.


Again, you may want to find the old cygnus releases of the r5900 
toolchain.  It had functional access to the second hi/lo register pair.


jeff

Re: Implementing TI mode (128-bit) and the 2nd pipeline for the MIPS R5900

2016-01-19 Thread Richard Earnshaw (lists)

On 19/01/16 14:42, Jeff Law wrote:
> On 01/19/2016 04:59 AM, Woon yung Liu wrote:
>>
>> In my current attempt at adding support for the TI mode, the MMI
>> definitions are added into a MD file for the R5900 and some functions
>> (i.e. mips_output_move) were modified to allow certain moves for the
>> TI mode of the R5900 target. However, while it seems like TI-mode
>> integers can now be passed between functions and used with the MMI
>> (within one 128-bit GPR), GCC still treats 128-bit moves as complex
>> moves (split across two 64-bit registers); its built-in functions
>> expect both $a0 and $a1 to be used if the first argument is a 128-bit
>> value. To return a 128-bit value, both $v0 and $v1 are used.
> You'll have to adjust FUNCTION_ARG and its counterpart for return values
> to describe how to pass these 128 bit values around.
> 
>>
>>
>> Otherwise, I believe that there are two solutions to the problem with
>> the calling convention (but again, I have no idea which is better):
>> 1. Keep the target as 64-bit. Support for MMI will be either
>> compromised (i.e. made to assemble and split the 128-bit vectors upon
>> entry/exit) or totally omitted. Perhaps omission would be best so
>> that there will never be a compromise in performance.
> 
>>
>> 2. Promote the word size of the R5900 to 128-bit. I think that SONY
>> might have done this, as the code from their late games used lq/sq
>> (quard-word load/store) to preserve registers. However, I think that
>> this goes against the existing ABIs, doesn't it? Plus the MMI
>> instruction set is proprietary and isn't used in any other MIPS.
> Changing the word size to 128 bit should not be necessary.
> 
> Many ports define patterns for operations on data types that are larger
> than their native word mode.
> 
> You really need to add the new patterns for operating on 128bit values
> to the machine description and adjust the parameter passing routines .
> 
> We did have to force the compiler to assume a 64bit *host* datatype
> (long long).  I don't recall the reasoning behind that.
> 

Probably because historically you needed CONST_DOUBLE (with VOIDmode) to
handle 128-bit immediates (a pair of HOST_WIDE_INTs).  It may not be
necessary any more with the new wide integer types.

R.

> 
>> If I carry on with my current design, I suppose that I need to make
>> it so that the hi1/lo1 registers are never used for other MIPS
>> targets. I didn't find a RTL constraint that meant something like
>> "nothing", so I made the new constraints define MD1_REGS (hi1/lo1) as
>> their MD_REGS (hi/lo) equivalents if the target is not the R5900
>> (much like the DSP ACC register constraint, ka). But unlike the DSP
>> ACC register constraint (ka), my constraints are used as alternatives
>> alongside whatever (i.e. x for hi/lo or ka for hi/lo/acc) that was
>> originally there. Would this be acceptable, given that there will be
>> two similar alternatives for some instructions when the target is not
>> the R5900?
> You define the registers & constraints normally.  However, you make the
> registers conditional on the target in use.  ie, if you're not on an
> r5900 target, then mark those registers as fixed.  That will prevent the
> compiler from trying to use them on things other than the r5900.
> 
> Again, you may want to find the old cygnus releases of the r5900
> toolchain.  It had functional access to the second hi/lo register pair.
> 
> jeff
>

RE: Implementing TI mode (128-bit) and the 2nd pipeline for the MIPS R5900

2016-01-19 Thread Matthew Fortune

Jeff Law  writes:
> On 01/19/2016 04:59 AM, Woon yung Liu wrote:
> >
> > In my current attempt at adding support for the TI mode, the MMI
> > definitions are added into a MD file for the R5900 and some functions
> > (i.e. mips_output_move) were modified to allow certain moves for the
> > TI mode of the R5900 target. However, while it seems like TI-mode
> > integers can now be passed between functions and used with the MMI
> > (within one 128-bit GPR), GCC still treats 128-bit moves as complex
> > moves (split across two 64-bit registers); its built-in functions
> > expect both $a0 and $a1 to be used if the first argument is a 128-bit
> > value. To return a 128-bit value, both $v0 and $v1 are used.
> You'll have to adjust FUNCTION_ARG and its counterpart for return values
> to describe how to pass these 128 bit values around.

I'm generally against modified calling conventions especially given the
number of them that MIPS already has. We opted against using new wider
registers for arguments/returns in MSA instead choosing to consider it
as an optimised convention rather than the standard.

What environment are you looking to support this in? Linux, bare metal,
BSD, other? There's a reasonable amount of housekeeping to consider for
context switching and debug depending on the environment.

On the topic of TImode... Do you ever truly end up with TImode data with
the R5900 extensions or is it all vector types? We initially had TImode
in various places for MSA and removed it all in favour of the vector
modes which made everything a lot cleaner. If there truly is TImode
support then things get a little ugly based on what I remember from
untangling MSA from TImode mainly because of the interaction with
multiplies.

> > Otherwise, I believe that there are two solutions to the problem with
> > the calling convention (but again, I have no idea which is better):
> > 1. Keep the target as 64-bit. Support for MMI will be either
> > compromised (i.e. made to assemble and split the 128-bit vectors upon
> > entry/exit) or totally omitted. Perhaps omission would be best so that
> > there will never be a compromise in performance.

As above I suggest this approach but allow vectors to be passed using
the pre-existing defacto convention and look at optimizing it later.

Matthew

RE: [Patch] MIPS FDE deletion

2016-01-19 Thread Maciej W. Rozycki

On Mon, 11 Jan 2016, Moore, Catherine wrote:

> >  Does it mean PR target/53276 has been fixed now?  What was the commit to
> > add .cfi support for the stubs?
> 
> I don't know about the status of PR target/53276.  The commit to add 
> .cfi support for call stubs was this one:
> 
> r184379 | rsandifo | 2012-02-19 08:44:54 -0800 (Sun, 19 Feb 2012) | 7 lines
> 
> gcc/
> * config/mips/mips.c (mips16_build_call_stub): Add CFI information
> to stubs with non-sibling calls.
> 
> libgcc/
> * config/mips/mips16.S (CALL_STUB_RET): Add CFI information.

 Thanks.  I thought it was someting recent, but this is fairly old.

 I saw your patch handles the `fn_stub' case among others and your test 
case included an `__fn_stub_foo' stub too, which is what PR target/53276 
is all about, which is why I thought it may have been resolved and the 
existence of the PR accidentally missed.

 BTW, your test case has a stub of the `fn_stub' kind (`__fn_stub_foo') 
and one of the `call_fp_stub' kind (`__call_stub_fp_foo'), but none of the 
`call_stub' kind (for `foo' it would be called `__call_stub_foo').  The 
latter has AFAICT been addressed by r184379.  Was the omission of the test 
case then deliberate for some reason (why?) or just accidental?

  Maciej

Re: Instruction scheduling for the R5900's 2 integer pipelines

2016-01-19 Thread Jeff Law


On 01/19/2016 09:22 AM, Woon yung Liu wrote:



Right now, I do have an old homebrew GCC v3.2.2 port to study as
well, but I didn't follow everything from it because I didn't want to
risk including obsolete constructs. Thanks for the information on the
old Cygnus port. I'll try to scrape together a working system with
it.



Look for a change from me in local-alloc.c, circa 1998.  At least I 
think that's where I had to twiddle things.


jeff

SH runtime switchable atomics - proposed design

2016-01-19 Thread Rich Felker

I've been working on the new version of runtime-selected SH atomics
for musl, and I think what I've got might be appropriate for GCC's
generated atomics too. I know Oleg was not very excited about doing
this on the gcc side from a cost/benefit perspective, but I think my
approach is actually preferable over inline atomics from a code size
perspective. It uses a single "cas" function with an "SFUNC" type ABI
(not standard calling convention) with the following constraints:

Inputs:
- R0: Memory address to operate on
- R1: Address of implementation function, loaded from a global
- R2: Comparison value
- R3: Value to set on success

Outputs:
- R3: Old value read, ==R2 iff cas succeeded.

Preserved: R0, R2.

Clobbered: R1, PR, T.

This call (performed from __asm__ for musl, but gcc would do it as SH
"SFUNC") is highly compact/convenient for inlining because it avoids
clobbering any of the argument registers that are likely to already be
in use by the caller, and it preserves the important values that are
likely to be reused after the cas operation.

For J2 and future J4, the function pointer just points to:

rts
 cas.l r2,r3,@r0

and the only costs vs an inline cas.l are loading the address of the
function (done in the caller; involves GOT access) and clobbering R1
and PR.

This is still a draft design and the version in musl is subject to
change at any time since it's not a public API/ABI, but I think it
could turn into something useful to have on the gcc side with a
-matomic-model=libfunc option or similar. Other ABI considerations for
gcc use would be where to store the function pointer and how to
initialize it. To be reasonably efficient with FDPIC the caller needs
to be responsible for loading the function pointer (and it needs to
always point to code, not a function descriptor) so that the callee
does not need a GOT pointer passed in.

Rich

Re: [musl] SH runtime switchable atomics - proposed design

2016-01-19 Thread Rich Felker

On Tue, Jan 19, 2016 at 03:28:52PM -0500, Rich Felker wrote:
> I've been working on the new version of runtime-selected SH atomics
> for musl, and I think what I've got might be appropriate for GCC's
> generated atomics too. I know Oleg was not very excited about doing
> this on the gcc side from a cost/benefit perspective, but I think my
> approach is actually preferable over inline atomics from a code size
> perspective. It uses a single "cas" function with an "SFUNC" type ABI
> (not standard calling convention) with the following constraints:
> 
> Inputs:
> - R0: Memory address to operate on
> - R1: Address of implementation function, loaded from a global
> - R2: Comparison value
> - R3: Value to set on success
> 
> Outputs:
> - R3: Old value read, ==R2 iff cas succeeded.
> 
> Preserved: R0, R2.
> 
> Clobbered: R1, PR, T.
> 
> This call (performed from __asm__ for musl, but gcc would do it as SH
> "SFUNC") is highly compact/convenient for inlining because it avoids
> clobbering any of the argument registers that are likely to already be
> in use by the caller, and it preserves the important values that are
> likely to be reused after the cas operation.
> 
> For J2 and future J4, the function pointer just points to:
> 
>   rts
>cas.l r2,r3,@r0
> 
> and the only costs vs an inline cas.l are loading the address of the
> function (done in the caller; involves GOT access) and clobbering R1
> and PR.
> 
> This is still a draft design and the version in musl is subject to
> change at any time since it's not a public API/ABI, but I think it
> could turn into something useful to have on the gcc side with a
> -matomic-model=libfunc option or similar. Other ABI considerations for
> gcc use would be where to store the function pointer and how to
> initialize it. To be reasonably efficient with FDPIC the caller needs
> to be responsible for loading the function pointer (and it needs to
> always point to code, not a function descriptor) so that the callee
> does not need a GOT pointer passed in.

Attached is my current draft of the implementations of the cas 'sfunc'
for musl. Forgot to include it before.

Rich
/* Contract for all versions is same as cas.l r2,r3,@r0
 * pr and r1 are also clobbered (by jsr & r1 as temp).
 * r0,r2,r4-r15 must be preserved.
 * r3 contains result (==r2 iff cas succeeded). */

.align 2
__sh_cas_gusa:
mov.l r5,@-r15
mov.l r4,@-r15
mov.l r0,r4
mova 1f,r0
mov r15,r1
mov #(0f-1f),r15
0:  mov.l @r4,r5
cmp/eq r5,r2
bf 1f
mov.l r3,@r4
1:  mov r1,r15
mov r5,r3
mov r4,r0
mov.l @r15+,r4
rts
 mov.l @r15+,r5

__sh_cas_llsc:
mov r0,r1
synco
0:  movli.l @r1,r0
cmp/eq r0,r2
bf 1f
mov r3,r0
movco.l r0,@r1
bf 0b
mov r2,r0
1:  synco
mov r0,r3
rts
 mov r1,r0

__sh_cas_imask:
mov r0,r1
stc sr,r0
mov.l r0,@-r15
or #0xf0,r0
ldc r0,sr
mov.l @r1,r0
cmp/eq r0,r2
bf 1f
mov r3,@r1
1:  ldc.l @r15+,sr
mov r0,r3
rts
 mov r1,r0

__sh_cas_cas_l:
rts
 cas.l r2,r3,@r0

gcc-5-20160119 is now available

2016-01-19 Thread gccadmin

Snapshot gcc-5-20160119 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/5-20160119/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch 
revision 232591

You'll find:

 gcc-5-20160119.tar.bz2   Complete GCC

  MD5=4fd7bfbebbffc85ee8583f60bbcab476
  SHA1=120b77d0c51385058c30894918002395e3e85b73

Diffs from 5-20160112 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: Source Code for Profile Guided Code Positioning

2016-01-19 Thread Sriraman Tallam

On Fri, Jan 15, 2016 at 9:51 AM, Yury Gribov  wrote:
> On 01/15/2016 08:44 PM, vivek pandya wrote:
>>
>> Thanks Yury for
>> https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html this link.
>> It implements procedure reordering as linker plugin.
>> I have some questions :
>> 1 ) Can you point me to some documentation for "how to write plugin
>> for linkers " I am I have not seen doc for structs with 'ld_' prefix
>> (i.e defined in plugin-api.h )
>>   2 ) There is one more algorithm for Basic Block ordering with
>> execution frequency count in PH paper . Is there any implementation
>> available for it ?
>
>
> Quite frankly - I don't know (I've only learned about Google implementation
> recently).
>
> I've added Sriram to maybe comment.

Sorry for the late response.  The google/gcc_4_9 branch has the source
of function reordering linker Plugin.  It is available in the
function_reordering_plugin directory under the top level gcc
directory.

The function reordering plugin constructs a callgraph and uses profile
information to do a Pettis Hansen style function reordering.   This
plugin does not do basic block re-ordering.

There is no documentation as such that I am aware of to write a linker
plugin.  Here is a very brief overview.   The linker calls the
plugin's "onload" function when registering the plugin and the plugin
inturn can register two call-backs with the linker, "claim_file_hook"
and the "all_symbols_read_hook".  "claim_file_hook" is called  for
each object file that the linker prcesses and the
"all_symbols_read_hook" is called after all the symbols have been read
by the linker.  These are just two different interesting points in the
course of a link.

The plugin can also get handles to linker functions like
"get_input_section_name" which it can use to process sections given
their handle. You can also check the gold linker tests for simpler
plugin examples.

HTH,
Thanks
Sri

>
> -Y

Re: Instruction scheduling for the R5900's 2 integer pipelines

Re: Implementing TI mode (128-bit) and the 2nd pipeline for the MIPS R5900

Re: Implementing TI mode (128-bit) and the 2nd pipeline for the MIPS R5900

RE: Implementing TI mode (128-bit) and the 2nd pipeline for the MIPS R5900

RE: [Patch] MIPS FDE deletion

Re: Instruction scheduling for the R5900's 2 integer pipelines

SH runtime switchable atomics - proposed design

Re: [musl] SH runtime switchable atomics - proposed design

gcc-5-20160119 is now available

Re: Source Code for Profile Guided Code Positioning

10 matches

Site Navigation

Mail list logo

Footer information