RE: Allocation of hotness of data structure with respect to the top of stack.

2015-07-10 Thread Matthew Fortune
Vladimir Makarov  writes:
> On 2015-07-05 7:11 AM, Ajit Kumar Agarwal wrote:
> > All:
> >
> > I am wondering allocation of hot data structure closer to the top of the 
> > stack increases
> the performance of the application.
> > The data structure are identified as hot and cold data structure and all 
> > the data
> structures are sorted in decreasing order of
> > The hotness and the hot data structure will be allocated closer to the top 
> > of the stack.
> >
> > The load and store on accessing with respect to allocation of data 
> > structure on stack
> will be faster with allocation of hot
> > Data structure closer to the top of the stack.
> >
> > Based on the above the code is generated with respect to load and store 
> > with the correct
> offset of the stack allocated on
> > the decreasing order of hotness.
> >
> >
> What can be done for this in RA was already done for LRA and reload
> about seven years ago.  More frequently used slots for spilled pseudos
> are placed closer to the address base (stack pointer or frame pointer).
> It gave considerable code size improvement and nice performance
> improvement on x86/x86-64 as the displacements in 128 range needs less
> memory in x86 insns.  The details can be found in my presentation of IRA
> or LRA on a GCC summit.
> 
> Placement of local arrays and structures are actually done in
> front-end.  I remember the existence of the problem and that somebody
> tried to address it but don't remember the result.  You can try to
> figure it out searching gcc mail archives.

One thing I have seen for MIPS is that we need to switch our frame to grow 
downwards so
that spill slots end up closer to SP than any potentially large local data 
structures.
This gives a noticeable code size benefit for compressed ISAs (MIPS16 and 
microMIPS).

This change has been buried on a list of mine for some time and is trivial to
implement given we already have a macro for it.

Matthew


RE: Question about "instruction merge" pass when optimizing for size

2015-08-19 Thread Matthew Fortune
DJ Delorie  writes:
> I've seen this on other targets too, sometimes so bad I write a quick
> target-specific "stupid move optimizer" pass to clean it up.
> 
> A generic pass would be much harder, but very useful.

Robert (on cc) is currently attempting some improvements to the regrename
pass to try and propagate registers back from the destination of a move
in order that the move becomes a no-op. This is for a few cases we have
seen for MIPS. If successful that may clean up a small number of such
problems for all targets.

Thanks,
Matthew


RE: GTY / gengtype question - adding a new header file

2015-09-01 Thread Matthew Fortune
Steve Ellcey  writes:
> On Tue, 2015-09-01 at 10:13 +0200, Georg-Johann Lay wrote:
> 
> >
> > I'd have a look at what BEs are using non-default target_gtfiles.
> >
> > Johann
> 
> There are a few BEs that add a .c file to target_gtfiles, but no
> platforms that add a .h file to target_gtfiles.  I do see a number
> of platforms that define the machine_function structure in their header
> file (aarch64.h, pa.h, i386.h) instead of their .c file though.
> 
> Maybe that is a better way to go for MIPS instead of doing something
> completely new.  If I move machine_function, mips_frame_info,
> mips_int_mask, and mips_shadow_set from mips.c to mips.h then I could
> put my new machine specific pass in a separate .c file from mips.c and
> not need to do anything with target_gtfiles.  The only reason I didn't
> want to do this was so that machine_function wasn't visible to the rest
> of GCC but that doesn't seem to have been an issue for other targets.

Personally I quite like the idea of separating out code if at all possible
so having to expose a bit more of the MIPS backend internals does not
seem like too high a cost.

Matthew


RE: RFC: Support x86 interrupt and exception handlers

2015-09-15 Thread Matthew Fortune
H.J. Lu  writes:
> On Thu, Sep 3, 2015 at 10:37 AM, H.J. Lu  wrote:
> > The interrupt and exception handlers are called by x86 processors.  X86
> > hardware puts information on stack and calls the handler.  The
> > requirements are
> >
> > 1. Both interrupt and exception handlers must use the 'IRET' instruction,
> > instead of the 'RET' instruction, to return from the handlers.
> > 2. All registers are callee-saved in interrupt and exception handlers.
> > 3. The difference between interrupt and exception handlers is the
> > exception handler must pop 'ERROR_CODE' off the stack before the 'IRET'
> > instruction.
> >
> > The design goals of interrupt and exception handlers for x86 processors
> > are:
> >
> > 1. No new calling convention in compiler.
> > 2. Support both 32-bit and 64-bit modes.
> > 3. Flexible for compilers to optimize.
> > 4. Easy to use by programmers.
> >
> > To implement interrupt and exception handlers for x86 processors, a
> > compiler should support:
> >
> > 1. void * __builtin_ia32_interrupt_data (void)
> 
> I got a feedback on the name of this builtin function.  Since
> it also works for 64-bit,  we should avoid ia32 in its name.
> We'd like to change it to
> 
> void * __builtin_interrupt_data (void)
> 
> Any comments?

For what it's worth, this seems like a good plan to me. I don't know x86
but how many variations of interrupt and exception handling mechanisms
are there? If there are lots then you may want to make it clear which
subset of them you intend to support. I just added a few more variations
of interrupt handlers to MIPS and it got complicated quite quickly.

I think I remember someone asking about interrupt handler support for
x86 some time ago and the answer then was that there were too many
variants to make it useful.

Thanks,
Matthew

> 
> > This function returns a pointer to interrupt or exception data pushed
> > onto the stack by processor.
> >
> > The __builtin_frame_address builtin isn't suitable for interrupt and
> > exception handlers since it returns the stack frame address on the
> > callee side and compiler may generate a new stack frame for stack
> > alignment.
> >
> > 2. 'interrupt' attribute
> >
> > Use this attribute to indicate that the specified void function without
> > arguments is an interrupt handler.  The compiler generates function entry
> > and exit sequences suitable for use in an interrupt handler when this
> > attribute is present.  The 'IRET' instruction, instead of the
> > 'RET' instruction, is used to return from interrupt handlers.  All
> > registers, except for the EFLAGS register which is restored by the
> > 'IRET' instruction, are preserved by the compiler.  The red zone
> > isn't supported in an interrupt handler; that is an interrupt
> > handler can't access stack beyond the current stack pointer.
> >
> > You can use the builtin '__builtin_ia32_interrupt_data' function to access
> > data pushed onto the stack by processor:
> >
> > void
> > f () __attribute__ ((interrupt))
> > {
> >   void *p = __builtin_ia32_interrupt_data ();
> >   ...
> > }
> >
> > 3. 'exception' attribute
> >
> > Use 'exception' instead of 'interrupt' for handlers intended to be
> > used for 'exception' (i.e. those that must pop 'ERROR_CODE' off the
> > stack before the 'IRET' instruction):
> >
> > void
> > f () __attribute__ ((exception))
> > {
> >   void *p = __builtin_ia32_interrupt_data ();
> >   ...
> > }
> >
> > Any comments, suggestions?
> >
> > Thanks.
> >
> >
> > --
> > H.J.
> 
> 
> 
> --
> H.J.


RE: RFC: Support x86 interrupt and exception handlers

2015-09-16 Thread Matthew Fortune
H.J. Lu  writes:
> On Tue, Sep 15, 2015 at 2:45 PM, Matthew Fortune
>  wrote:
> > H.J. Lu  writes:
> >> On Thu, Sep 3, 2015 at 10:37 AM, H.J. Lu  wrote:
> >> > The interrupt and exception handlers are called by x86 processors.  X86
> >> > hardware puts information on stack and calls the handler.  The
> >> > requirements are
> >> >
> >> > 1. Both interrupt and exception handlers must use the 'IRET' instruction,
> >> > instead of the 'RET' instruction, to return from the handlers.
> >> > 2. All registers are callee-saved in interrupt and exception handlers.
> >> > 3. The difference between interrupt and exception handlers is the
> >> > exception handler must pop 'ERROR_CODE' off the stack before the 'IRET'
> >> > instruction.
> >> >
> >> > The design goals of interrupt and exception handlers for x86 processors
> >> > are:
> >> >
> >> > 1. No new calling convention in compiler.
> >> > 2. Support both 32-bit and 64-bit modes.
> >> > 3. Flexible for compilers to optimize.
> >> > 4. Easy to use by programmers.
> >> >
> >> > To implement interrupt and exception handlers for x86 processors, a
> >> > compiler should support:
> >> >
> >> > 1. void * __builtin_ia32_interrupt_data (void)
> >>
> >> I got a feedback on the name of this builtin function.  Since
> >> it also works for 64-bit,  we should avoid ia32 in its name.
> >> We'd like to change it to
> >>
> >> void * __builtin_interrupt_data (void)
> >>
> >> Any comments?
> >
> > For what it's worth, this seems like a good plan to me. I don't know x86
> > but how many variations of interrupt and exception handling mechanisms
> > are there? If there are lots then you may want to make it clear which
> > subset of them you intend to support. I just added a few more variations
> > of interrupt handlers to MIPS and it got complicated quite quickly.
> >
> > I think I remember someone asking about interrupt handler support for
> > x86 some time ago and the answer then was that there were too many
> > variants to make it useful.
> 
> In my proposal, there are only 2 handlers: interrupt and exception.
> __builtin_interrupt_data is provided to programmers to implement
> different variants of those handlers.

Yes, I realised that but was just curious how many hardware interrupt
handling schemes there are for x86. I.e. How the handlers are glued into
an interrupt vector/how they get routed. Is there a generic piece of
code that could at least hook up/install an exception handler on most
x86 variants?

Matthew


RE: Implementing TI mode (128-bit) and the 2nd pipeline for the MIPS R5900

2016-01-19 Thread Matthew Fortune
Jeff Law  writes:
> On 01/19/2016 04:59 AM, Woon yung Liu wrote:
> >
> > In my current attempt at adding support for the TI mode, the MMI
> > definitions are added into a MD file for the R5900 and some functions
> > (i.e. mips_output_move) were modified to allow certain moves for the
> > TI mode of the R5900 target. However, while it seems like TI-mode
> > integers can now be passed between functions and used with the MMI
> > (within one 128-bit GPR), GCC still treats 128-bit moves as complex
> > moves (split across two 64-bit registers); its built-in functions
> > expect both $a0 and $a1 to be used if the first argument is a 128-bit
> > value. To return a 128-bit value, both $v0 and $v1 are used.
> You'll have to adjust FUNCTION_ARG and its counterpart for return values
> to describe how to pass these 128 bit values around.

I'm generally against modified calling conventions especially given the
number of them that MIPS already has. We opted against using new wider
registers for arguments/returns in MSA instead choosing to consider it
as an optimised convention rather than the standard.

What environment are you looking to support this in? Linux, bare metal,
BSD, other? There's a reasonable amount of housekeeping to consider for
context switching and debug depending on the environment.

On the topic of TImode... Do you ever truly end up with TImode data with
the R5900 extensions or is it all vector types? We initially had TImode
in various places for MSA and removed it all in favour of the vector
modes which made everything a lot cleaner. If there truly is TImode
support then things get a little ugly based on what I remember from
untangling MSA from TImode mainly because of the interaction with
multiplies.

> > Otherwise, I believe that there are two solutions to the problem with
> > the calling convention (but again, I have no idea which is better):
> > 1. Keep the target as 64-bit. Support for MMI will be either
> > compromised (i.e. made to assemble and split the 128-bit vectors upon
> > entry/exit) or totally omitted. Perhaps omission would be best so that
> > there will never be a compromise in performance.

As above I suggest this approach but allow vectors to be passed using
the pre-existing defacto convention and look at optimizing it later.

Matthew




RE: [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register

2016-01-25 Thread Matthew Fortune
Andreas Arnez  writes:
> 6 Summary of Open Questions
> ===
> 
>   1. Out of the standard interpretations discussed under "options"
>  (section 4) above, which do we want to settle on?  Or is the
>  "preferred" interpretation missing from that list?
>   2. Should pieces fully or partially outside their underlying objects
>  be considered valid or invalid?  If valid, how should they be
>  aligned and padded?  In any case, what is the suggested treatment
>  by a DWARF consumer?

My dwarf knowledge is not brilliant but I have had to recently consider
it for MIPS floating point ABI changes aka FPXX and friends. I don't know
exactly where this fits in to your whole description but in case it has
a bearing on this we now have the following uses of DW_OP_piece:

1) double precision data split over two 32-bit FPRs
Uses a pair of 32-bit DW_OP_piece (ordered depending on endianness).

2) double precision data in one 64-bit FPR
No need for DW_OP_piece.

3) double precision data that may be in two 32-bit FPRs or may be in
   one 64-bit FPR depending on hardware mode
Uses a single 64-bit DW_OP_piece on the even numbered register. 

I'm guilty of not actually finishing this off and doing the GDB side but
the theory seemed OK when I did it! From your description this behaviour
best matches DW_OP_piece having ABI dependent behaviour which would make
it acceptable. These three variations can potentially exist in the same
program albeit that (1) and (3) can't appear in a single shared library
or executable. It's all a little strange but the whole floating point
MIPS o32 ABI is pretty complex now.

Matthew


RE: [RFC] DW_OP_piece vs. DW_OP_bit_piece on a Register

2016-02-11 Thread Matthew Fortune
Sorry for the slow response...

Andreas Arnez  writes:
> On Mon, Jan 25 2016, Matthew Fortune wrote:
> 
> > My dwarf knowledge is not brilliant but I have had to recently
> > consider it for MIPS floating point ABI changes aka FPXX and friends.
> > I don't know exactly where this fits in to your whole description but
> > in case it has a bearing on this we now have the following uses of
> DW_OP_piece:
> >
> > 1) double precision data split over two 32-bit FPRs
> > Uses a pair of 32-bit DW_OP_piece (ordered depending on endianness).
> >
> > 2) double precision data in one 64-bit FPR
> > No need for DW_OP_piece.
> >
> > 3) double precision data that may be in two 32-bit FPRs or may be in
> >one 64-bit FPR depending on hardware mode
> > Uses a single 64-bit DW_OP_piece on the even numbered register.
> 
> Hm, so in 32-bit hardware mode the DWARF consumer is expected to treat
> the DW_OP_piece on the even numbered register as if it were two pieces
> from two consecutive registers?

Yes.

> Or should we rather consider the even
> numbered register to be 64 bit wide, where the right half shadows the
> next odd-numbered register?  If so, I believe you generally want pieces
> from FPRs to be taken from the left, correct?

No I think this is backwards it is the left half that shadows the next
register and pieces are taken from the right. I've attempted a description
below to see if it helps.

I don't believe (in the MIPS case) we could unconditionally view the even
numbered register as 64-bit or 32-bit as the shadowing onto the next
register only exists in some hardware modes.

The size of a register has to be determined from the current hardware mode
and then the logic would be to read as much as possible from the referenced
register and use it as the lower bits of the overall value. Then continue
reading consecutive registers filling the next most significant bits
until the full size of the DW_OP_piece has been read. This for MIPS
FP registers is endian agnostic as the higher numbered register always
has the most significant bits. For GPRs the even numbered register will
provide either the most or least significant bits depending on endian but
we have no reason to use this paradoxical DW_OP_piece for GPRs as they
have compile time deterministic size.

> > I'm guilty of not actually finishing this off and doing the GDB side
> > but the theory seemed OK when I did it! From your description this
> > behaviour best matches DW_OP_piece having ABI dependent behaviour
> > which would make it acceptable. These three variations can potentially
> > exist in the same program albeit that (1) and (3) can't appear in a
> > single shared library or executable. It's all a little strange but the
> > whole floating point MIPS o32 ABI is pretty complex now.
> 
> I don't quite understand why (1) and (3) can not coexist in the same
> shared library or executable.  Can you elaborate a bit?

Oops. Sorry it is (1) and (2) that can't coexist. I'm not sure you
really want to know the gory details but the explanation is below if
you're feeling brave.

The reason these can't co-exist is really just because there would need
to be yet another ABI variant/ELF marker to record the requirements of
such an executable. It would be a combination of FP64A and FP32 and that
would mandate a hardware mode of FR=1 FRE=1 which is the one mode that
we desperately do not want to be in as it uses kernel emulation to make
it all work. The combination of FP64A and FP32 ABIs is supported to
enable some level of transition from original o32 (FP32) through to FP64
without requiring moving everything to FPXX first. We allow this across
a shared library boundary to give just enough support for software to
transition. The aim is to encourage the full transition to FPXX rather
than going through a period of creating binaries that will always need
kernel emulation regardless of the host architecture.
 
> I'm curious about the interaction with vector registers.  AFAIK, vector
> registers on MIPS also embed the FPRs (left or right?).

Probably best to compare NEON with other SIMD. MSA works like NEON on
AArch64 rather than AArch32. I.e. it widens each register to 128-bit
and uses the same DWARF registers as FPRs. A DW_OP_piece therefore
corresponds to part of the 128-bit register. 

> Are the same DWARF register numbers used for both?

Yes. I think we can get away with using the same dwarf numbers as the
FPRs sit in the LSB of the vector register regardless of endian or
double/single data but that is a moot point, see below.

> And when taking a 64-bit DW_OP_piece from a vector register, would
> this piece correspond to the embedded FPR?

Strict architecture definition says no as the register sets do not
necessarily have to b

RE: (MIPS R5900) Adding support for the FPU (COP1) instructions

2016-03-30 Thread Matthew Fortune
Woon yung Liu  writes:
> Hi all,
> 
> Thank you all for the help so far. This is probably the final part of my
> efforts to complete support for the R5900 within GCC, but I need help
> this time because the existing homebrew GCC version has no support for
> this (despite what its README file says). Hence I have nothing to refer
> to.
> 
> The R5900 has support for a couple of floating-point arithmetic, with
> its FPU (COP1). The FPU instructions are something like these:
> MADD.S (rd <- ACC + rs * rt)
> MADDA.S (ACC <- ACC + rs * rt)
> MSUB.S  (rd <- ACC - rs * rt)
> MSUBA.S (ACC <- ACC - rs * rt)
> ADDA.S (ACC <- rs + rt)
> SUBA.S (ACC <- rs - rt)
> MULA.S (ACC <- rs * rt)
> 
> These instructions are similar to those floating-point instructions with
> similar-looking names, in normal MIPS processors. But they involve the
> R5900's FPU accumulator (ACC) register instead.
> I didn't find an explicit instruction to move values to/from the ACC
> register.

How wide is the ACC register on r5900? Is it just 64-bit or does it offer
wider precision?

The move in to ACC should be achievable with ADDA.S where rt == 0.0 and
move out with MADDA.S with rs and rt == 0.0. If ACC is wider than 64-bit
then there will be some rounding mode to account for.

You will also need to know if the madd/msub instructions are fused or
unfused (i.e. with or without intermediate rounding as GCC will use them
differently depending on the answer (see fma4 pattern).

> I've added instruction patterns (or added new alternatives to existing
> patterns) to the pattern file for the R5900, but I have not seen GCC
> emit any of these new instructions. I did add a new register (the ACC
> register), as well as a matching constraint and predicate.
> 
> Is there a proper way for me to debug this? Or is this not working,
> simply because GCC can't support such instructions on its own (it looks
> complicated to issue)?

Given that the base architecture has support for all the operations you
listed (using just FPRs) then there needs to be some way to indicate why
the accumulator based patterns should be used instead. (or you could
simply only use these patterns when targeting the FPU extension in
R5900). So... Under what circumstance is it preferable to use these
instructions?

Thanks,
Matthew

> 
> Thanks and regards,
> -W Y


RE: [RFC v2] MIPS ABI Extension for IEEE Std 754 Non-Compliant Interlinking

2016-05-16 Thread Matthew Fortune
Hi Maciej,

Thanks for the update.  I've read through the whole proposal again and
it looks good.  I'd like to discuss legacy objects a bit more though...

Maciej Rozycki  writes:
> 3.4 Relocatable Object Generation
> 
>  Tools that produce relocatable objects such as the assembler shall
> always produce a SHT_MIPS_ABIFLAGS section according to the IEEE Std 754
> compliance mode selected.  In the absence of any explicit user
> instructions the `strict' mode shall be assumed.  No new `legacy'
> objects shall be produced.

Is it necessary to say that no new legacy objects can be created?

I think there is value in still being able to generate legacy objects because
of the fact that strict executables leave no room for override at runtime.
Apart from the fact that strict cannot be disabled there is otherwise no
difference between legacy and strict compliance modes.

I believe the strict option is really intended for conscious use so that
programmers who know they need it, can use it. Ordinary users still get the
casual safety they need as legacy objects are just as good as strict until
overridden. If we lose the ability to override then in some environments we
will accumulate lots of needlessly strict executables just because of a tools
upgrade whereas the old tools would have generated executables that were as
safe but also could be overridden by kernel options. 

Allowing legacy objects to be generated may also allow the linkage rules to
be tightened.  I.e. Forcing a relaxed mode at link time could simply fail
if confronted by a strict object instead only allowing legacy objects to
be relaxed.

A default build of GCC and binutils would therefore still generate legacy
objects until someone consciously updated the configure options or used
command line options.

Thanks,
Matthew


RE: (R5900) Implementing Vector Support

2016-06-03 Thread Matthew Fortune
Woon yung Liu  writes:
> On Wednesday, June 1, 2016 5:45 AM, Richard Henderson 
> wrote:
> > This is almost always incorrect, and certainly before reload.
> > You need to use gen_lowpart.  There are examples in the code
> 
> > fragments that I sent the other week.
> 
> The problem is that gen_lowpart() doesn't seem to support casting to
> other modes of the same size.
> When I use it, the assert within gen_lowpart_general() will fail due to
> gen_lowpart_common() rejecting the operation (new mode size is not
> smaller than the old).

The conclusion we came to when developing MSA is that simplify_gen_subreg
is the way to go for converting between vector modes:

simplify_gen_subreg (new_mode, rtx, old_mode, 0)

I'm not sure there is much need to change modes after reload so do it
upfront at expand time or when splitting and you should be OK.

See trunk mips.c for a number of uses of this when converting vector modes.
 
> > You need to read the gcc internals documentation.  They are all three
> different
> 
> > uses, though there is some overlap between define_insn and
> define_expand.
> 
> I actually read the GCC internals documentation before I even begun any
> attempt at this, but there was still a lot that I did not successfully
> grasp.
> 
> I'll go with define_expand then.

define_expand only provides a way to generate an instruction or sequence
of instructions to achieve the overall goal. You must also have
define_insn definitions for any pattern you emit or the generated code
will fail to match.

A define_insn_and_split is just shorthand for a define_insn where one
or more output patterns are '#' (split) and you want to define the
split alongside the instruction rather than as a separate define_split.
As far as I understand the difference is syntactic sugar.

> But I am already doubting that I will complete this port as I can no
> longer see a favourable conclusion.

It may take time but I'm sure we can help talk through the problems. As
a new GCC developer you are a welcome addition to the community.

Thanks,
Matthew



RE: Return value on MIPS N64 ABI

2016-06-13 Thread Matthew Fortune
Heiher  writes:
> Looks the return value of TestNewA is passed on $f0/$f2 from disassembly
> code.  I don't known why the return value of TestNewB is passed on
> $v0/$v1? a bug?

I believe this is an area where GNU strays from the N64 ABI definition but
is defacto standard. TestA is a struct of two floating point fields which
is passed and returned in FP registers. TestB is a struct of a struct of
two floating point fields (or at least I think that is the interpretation).

The ABI does say that this should be flattened and be seen as simply two
floating point fields but GCC does not and passes it in integer registers
instead. Or at least the ABI says this for arguments but not results.

The relevant bit of the ABI we are not adhering to is 'Structs,unions' on
page 7 which covers arguments, however the corresponding text for results
does not include the part about ignoring the struct field structure
when determining between floating point and integer chunks.

https://dmz-portal.ba.imgtec.org/mw/images/6/6f/007-2816-005-1.pdf

FWIW: Clang/LLVM ABI implementation matches GCC in this regard as we run
cross linking tests and use GCC as 'correct'.

Thanks,
Matthew

> 229 00012c40 <_Z8TestNewAv>:
> 23012c40:   3c030002lui v1,0x2
> 23112c44:   0079182ddaddu   v1,v1,t9
> 23212c48:   64638400daddiu  v1,v1,-31744
> 23312c4c:   dc628050ld  v0,-32688(v1)
> 23412c50:   67bdffe0daddiu  sp,sp,-32
> 23512c54:   d4400e68ldc1$f0,3688(v0)
> 23612c58:   dc628050ld  v0,-32688(v1)
> 23712c5c:   67bd0020daddiu  sp,sp,32
> 23812c60:   03e8jr  ra
> 23912c64:   d4420e70ldc1$f2,3696(v0)
> 240
> 241 00012c68 <_Z8TestNewBv>:
> 24212c68:   3c0307f9lui v1,0x7f9
> 24312c6c:   3c0207f7lui v0,0x7f7
> 24412c70:   3463ori v1,v1,0x
> 24512c74:   3442ori v0,v0,0x
> 24612c78:   00031cb8dsllv1,v1,0x12
> 24712c7c:   000214b8dsllv0,v0,0x12
> 24812c80:   3463cccdori v1,v1,0xcccd
> 24912c84:   3442cccdori v0,v0,0xcccd
> 25012c88:   67bdfff0daddiu  sp,sp,-16
> 25112c8c:   00031c78dsllv1,v1,0x11
> 25212c90:   00021478dsllv0,v0,0x11
> 25312c94:   6463999adaddiu  v1,v1,-26214
> 25412c98:   6442999adaddiu  v0,v0,-26214
> 25512c9c:   03e8jr  ra
> 25612ca0:   67bd0010daddiu  sp,sp,16
> 
> // test.cpp
> // gcc -march=mips64r2 -mabi=64 -O3 -o test test.cpp #include 
> 
> class TestA
> {
> public:
> double l;
> double h;
> 
> TestA(double l, double h) : l(l), h(h) {} };
> 
> class TestB : public TestA
> {
> public:
> TestB(const TestA& a) : TestA(a) {} };
> 
> TestA
> TestNewA(void)
> {
> return TestA(0.1, 0.2);
> }
> 
> TestB
> TestNewB(void)
> {
> return TestA(0.1, 0.2);
> }
> 
> int
> main(int argch, char *argv[])
> {
> TestA a = TestNewA();
> printf("%lf, %lf\n", a.l, a.h);
> 
> TestB b = TestNewB();
> printf("%lf, %lf\n", b.l, b.h);
> 
> return 0;
> }


RE: [RFC] Rationale for passing vectors by value in SIMD registers

2016-07-17 Thread Matthew Fortune
Andrew Pinski  writes:
> On Sat, Feb 15, 2014 at 12:16 AM, Matthew Fortune
>  wrote:
> >> On Fri, Feb 14, 2014 at 2:17 AM, Matthew Fortune
> >>  wrote:
> >> > MIPS is currently evaluating the benefit of using SIMD registers to pass
> >> vector data by value. It is currently unclear how important it is for 
> >> vector data
> >> to be passed in SIMD registers. I.e. the need for passing vector data by 
> >> value
> >> in real world code is not immediately obvious. The performance advantage is
> >> therefore also unclear.
> >> >
> >> > Can anyone offer insight in the rationale behind decision decisions made
> >> for other architectures ABIs? For example, the x86 and x86_64 calling
> >> convention for vector data types presumes that they will passed in SSE/AVX
> >> registers and raises warnings if passed when sse/avx support is not 
> >> enabled.
> >> This is what MIPS is currently considering however there are two concerns:
> >> >
> >> > 1) What about the ability to create architecture/implementation
> >> independent APIs that may include vector types in the prototypes. Such APIs
> >> may be built for varying levels of hardware support to make the most of a
> >> specific architecture implementation but be called from otherwise
> >> implementation agnostic code. To support such a scenario we would need to
> >> use a common calling convention usable on all architecture variants.
> >> > 2) Although vector types are not specifically covered by existing ABI
> >> definitions for MIPS we have unfortunately got a defacto standard for how
> >> to pass these by value. Vector types are simply considered to be small
> >> structures and passed as such following normal ABI rules. This is still a
> >> concern even though it is generally accepted that there is some room for
> >> change when it comes to vector data types in an existing ABI.
> >> >
> >> > If anyone could offer a brief history the x86 ABI with respect to vector 
> >> > data
> >> types that may also be interesting. One question would be whether the use
> >> of vector registers in the calling convention was only enabled by default 
> >> once
> >> there was a critical mass of implementations, and therefore the default ABI
> >> was changed to start making assumptions about the availability of features
> >> like SSE and AVX.
> >> >
> >> > Comments from any other architecture that has had to make such changes
> >> over time would also be welcome.
> >>
> >> PPC and arm and AARCH64 are common targets where vectors are
> >> passed/return via value.  The idea is simple, sometimes you have functions
> >> like vector float vsinf(vector float a) where you want to be faster and 
> >> avoid a
> >> round trip to L1 (or even L2).  These kind of functions are common for 
> >> vector
> >> programming.  That is extending the scalar versions to the vector versions.
> >
> > I suppose this cost (L1/L2) is mitigated to some extent if the base ABI 
> > were to pass a
> vector in multiple GP/FP register rather than via the stack. There would of 
> course still
> be a cost to marshall the data between GP/FP and SIMD registers. For such a 
> support
> routine like vsinf I would expect it also needs a reduced clobber set to 
> ensure that the
> caller's live SIMD registers don't need saving/restoring, such registers 
> would normally be
> caller-saved. If the routine were to clobber all SIMD registers anyway then 
> the
> improvement in argument passing seems negligible.
> >
> > Do you/anyone know of any open source projects, which have started adopting 
> > generic
> vector types, and show the use of this kind of construct?
> 
> Yes glibc provides these functions on x86 now.

Wow, old thread you must have a good memory! I saw libmvec go in some time ago, 
I
guess you are referring to that or is there something else now (I'm out of date
with glibc development)?

I am hoping to steer MIPS towards supporting passing vectors by value via a an 
ABI
extension that is opt-in rather than default. The main reason is the range of
competing vector extensions whether defined as official ASEs or core specific. I
think we can still get vectors passed by value with the only extra requirement
being that a prototype would need a calling convention attribute.

Thanks,
Matthew


RE: [RFC v2] MIPS ABI Extension for IEEE Std 754 Non-Compliant Interlinking

2016-11-11 Thread Matthew Fortune
Maciej Rozycki  writes:
>  I'm back to this effort now, thanks for patience.

Likewise, this thread got buried in email. At the risk of further delaying
this work I do still have issues with the design.

> > > 3.4 Relocatable Object Generation
> > >
> > >  Tools that produce relocatable objects such as the assembler shall
> > > always produce a SHT_MIPS_ABIFLAGS section according to the IEEE Std
> > > 754 compliance mode selected.  In the absence of any explicit user
> > > instructions the `strict' mode shall be assumed.  No new `legacy'
> > > objects shall be produced.
> >
> > Is it necessary to say that no new legacy objects can be created?
> 
>  It is of course not necessary in the sense that we are free to make a
> decision here rather than being constrained by hardware or another
> technical requirement.
> 
>  However I think only supporting the two explicit `strict' and `relaxed'
> modes makes the solution proposed consistent, and has indeed been the
> basis of my design.

This is the bit I think is actually less consistent than it seems on the
surface. The issue is that we have to punch holes in the design in order
to facilitate those users who don't care about ieee compliance and the
way it is done currently is that we allow the static linker to completely
override the 'strict'ness of an input object when using --ieee=relaxed.

This means that a user consciously creating an object that 'needs' ieee
compliance via use of -fieee=strict or -mieee=strict is thwarted by the
next user who builds the executable. This kind of scenario can occur with
a static library prepared by an expert in floating point and then someone
casually including that into a bigger application. Obviously a similar
issue is present with the rules around executable and shared libraries
where the executable's compliance mode can override a shared library
but at this level we are not losing any information and the executable
has either very specifically been set to 'relaxed' mode or the kernel
has set legacy to mean relaxed. The latter can at least be fixed by
changing the kernel. Losing information in a static link cannot be
fixed.

The assumption in the specification is that a user creating a final
executable is the one who should know about ieee compliance issues but
I am not convinced about this. I think it is at the compilation unit
level that issues of ieee compliance should be decided and that should
be a conscious act of the user (or toolchain packager if they want to
force all software to care). I.e. a strict object is strict without
exception throughout static linkage.

The point about not removing legacy objects is that it is an ideal
container for individual objects that don't care or don't yet know
they care about ieee compliance.

The relaxed variant then only becomes applicable when linking multiple
objects and a user has specifically asked to downgrade legacy to relaxed.
Given a legacy object could already be downgraded to relaxed under the
current spec then this does not affect the behaviour of a legacy object.

What this also means is that we can propagate 'legacy' objects through
relocatable links to still get a legacy object which is consistent with
respect to its nan encoding and only later at final link downgrade,
upgrade or retain the compliance level.

These rules combined also mean that a new toolchain that includes
support for these features will, by default, just continue to behave
as today which is the norm for new features. If a user or toolchain
packager decides to enforce stricter ieee checks then they can and
they can face the complexities of combining old and new software
packages as laid out in this spec.

The static linker rules would be almost the same as now but would
not allow strict to be downgraded to relaxed and the 'default' behaviour
could result in legacy objects if they were the only ones in the link.
The spec is currently a little vague on default linker behaviour when
no --ieee option is given but the rules for combining objects without
an override seem relatively obvious. Any 'relaxed' object leads to
a relaxed executable, any 'strict' object leads to a strict executable,
'strict' and 'relaxed' cannot mix, otherwise 'legacy'. This also
eliminates the need for 'nowarn/warn'.

>  The principle I have followed here has been that it's the author of
> software who knows exactly what his code requires for correct operation,
> and sets the correct flags in the Makefile or whatever build system he
> uses.  Joseph's earlier proposal to have a generic `-fieee' option
> rather than a target one fits here very well in that the author can
> stick it in the build system without the exact knowledge of what target-
> specific constraints may be.  Such an option would preferably be placed
> such that it is not accidentally lost e.g. with an OS distribution
> packager's CFLAGS override, which is typically made in a system-wide
> automated way when a distribution is build.
> 
>  Still a distribution pack

RE: Help w/ PR61538?

2014-07-28 Thread Matthew Fortune
Hi Joshua,

I know very little about this area but I'll try and offer some advice anyway...

> On 07/05/2014 23:43, Joshua Kinard wrote:
> > Hi,
> >
> > I filed PR61538 about two weeks ago, regarding gcc-4.8.x and up not
> > compiling a g++/pthreads-linked app correctly on SGI R1x000-based systems
> > (Octane, Onyx2), running Linux.  Running the subsequently-compiled
> > application simply hangs in a futex syscall until terminated via Ctrl+C.
> I
> > suspect it's a double-locking bug of some design, as evidenced by strace
> > showing two consecutive syscall()'s w/ 0x108e passed as the syscall #
> (4238
> > or futex on o32 MIPS), but I am stumped as to what else I can do to debug
> it
> > and help fix it.
> >
> [snip]
> > Full details:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61538
> 
> So I've spent the last few weeks bisecting the gcc tree, and I've narrowed
> down the set of commits that appear to have introduced this problem:
> 
> 1. 39a8c5eaded1e5771a941c56a49ca0a5e9c5eca0  * config/mips/mips.c
> (mips_emit_pre_atomic_barrier_p,)

This is the prime candidate for introducing the issue.

> 2. 974f0a74e2116143b88d8cea8e1dd5a9c18ef96c  * config/mips/constraints.md
> (ZR): New constraint.

Unlikely

> 3. 0f8e46b16a53c02d7255dcd6b6e9b5bc7f8ec953  * config/mips/mips.c
> (mips_process_sync_loop): Emit cmp result only if

Possible but unlikely still

> 4. 30c3c4427521f96fb58b6e1debb86da4f113f06f  * emit-rtl.c
> (need_atomic_barrier_p): New function.

Seems unlikely

> 
> There's a build failure somewhere in the middle of there that is blocking me
> from figuring out which specific one is the cause, but they all appear to be
> related anyways.  All four were added on 2012-06-20.
> 
> When I took a git checkout from 2012-06-26 and reverted those four commits,
> I was able to compile glibc-2.19 and get a working "sln" binary.  I am
> unable to easily test the C++ side because I built the checkouts in my
> $HOME, and it's too risky to try and shoehorn one of them in as the system
> compiler.  However, I think the C++ issue is also fixed by reverting the
> four, as that also involved hanging in Linux futex syscalls.

Here is a wild guess at the problem... I think the workaround for R1 to
use branch likely instead of delay slot branches is ending up annulling
an instruction that is required for certain atomic operations. This is an
entirely untested theory (and patch) but can you see if this fixes the issue
you are seeing:

@@ -13014,7 +13023,8 @@ mips_process_sync_loop (rtx insn, rtx *operands)
   mips_multi_copy_insn (tmp3_insn);
   mips_multi_set_operand (mips_multi_last_index (), 0, newval);
 }
-  else if (!(required_oldval && cmp))
+  else if (!(required_oldval && cmp)
+   || mips_branch_likely)
 mips_multi_add_insn ("nop", NULL);

   /* CMP = 1 -- either standalone or in a delay slot.  */

I suspect I can weave that in more naturally but can you tell me if that
fixes the problem first.

Regards,
Matthew


RE: Help w/ PR61538?

2014-07-28 Thread Matthew Fortune
I'll switch to replying on PR61538. I had not read all the ticket
previously and although I may have found a problem it seems it may not
be the cause of this failure.

The generated code differences after the patches seem significant but
I may not get chance to look at the differences in detail for a little
while.

Matthew

> -Original Message-
> From: Joshua Kinard [mailto:ku...@gentoo.org]
> Sent: 28 July 2014 10:40
> To: Matthew Fortune; gcc@gcc.gnu.org
> Subject: Re: Help w/ PR61538?
> 
> On 07/28/2014 04:41, Matthew Fortune wrote:
> > Hi Joshua,
> >
> > I know very little about this area but I'll try and offer some advice
> anyway...
> >
> 
> You know more than I do :)
> 
> 
> >> On 07/05/2014 23:43, Joshua Kinard wrote:
> >>> Hi,
> >>>
> >>> I filed PR61538 about two weeks ago, regarding gcc-4.8.x and up not
> >>> compiling a g++/pthreads-linked app correctly on SGI R1x000-based
> systems
> >>> (Octane, Onyx2), running Linux.  Running the subsequently-compiled
> >>> application simply hangs in a futex syscall until terminated via Ctrl+C.
> >> I
> >>> suspect it's a double-locking bug of some design, as evidenced by strace
> >>> showing two consecutive syscall()'s w/ 0x108e passed as the syscall #
> >> (4238
> >>> or futex on o32 MIPS), but I am stumped as to what else I can do to
> debug
> >> it
> >>> and help fix it.
> >>>
> >> [snip]
> >>> Full details:
> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61538
> >>
> >> So I've spent the last few weeks bisecting the gcc tree, and I've
> narrowed
> >> down the set of commits that appear to have introduced this problem:
> >>
> >> 1. 39a8c5eaded1e5771a941c56a49ca0a5e9c5eca0  * config/mips/mips.c
> >> (mips_emit_pre_atomic_barrier_p,)
> >
> > This is the prime candidate for introducing the issue.
> 
> This is my guess, too.  However, it appears to tie in w/ the fourth commit
> because the new mips_emit_{pre,post}_atomic_barrier_p functions added in
> commit 39a8c5ea are removed by commit 30c3c442 a mere ~7 minutes later
> (which I find really odd).  Commit 974f0a74 is really the only one that
> seems innocent, but I suspect the other three are linked.  If mkuvyrkov is
> still around, perhaps he could explain better?
> 
> 
> >> 2. 974f0a74e2116143b88d8cea8e1dd5a9c18ef96c  * config/mips/constraints.md
> >> (ZR): New constraint.
> >
> > Unlikely
> >
> >> 3. 0f8e46b16a53c02d7255dcd6b6e9b5bc7f8ec953  * config/mips/mips.c
> >> (mips_process_sync_loop): Emit cmp result only if
> >
> > Possible but unlikely still
> >
> >> 4. 30c3c4427521f96fb58b6e1debb86da4f113f06f  * emit-rtl.c
> >> (need_atomic_barrier_p): New function.
> >
> > Seems unlikely
> >
> >>
> >> There's a build failure somewhere in the middle of there that is blocking
> me
> >> from figuring out which specific one is the cause, but they all appear to
> be
> >> related anyways.  All four were added on 2012-06-20.
> >>
> >> When I took a git checkout from 2012-06-26 and reverted those four
> commits,
> >> I was able to compile glibc-2.19 and get a working "sln" binary.  I am
> >> unable to easily test the C++ side because I built the checkouts in my
> >> $HOME, and it's too risky to try and shoehorn one of them in as the
> system
> >> compiler.  However, I think the C++ issue is also fixed by reverting the
> >> four, as that also involved hanging in Linux futex syscalls.
> >
> > Here is a wild guess at the problem... I think the workaround for R1
> to
> > use branch likely instead of delay slot branches is ending up annulling
> > an instruction that is required for certain atomic operations. This is an
> > entirely untested theory (and patch) but can you see if this fixes the
> issue
> > you are seeing:
> 
> Well, the branch-likely thing really only affects a specific revision of the
> R1 processors.  Later R1 revisions (3.1+?) and R12000-R16000
> shouldn't be affected.  I've been playing with disabling that specific
> workaround on my Octane's kernel and haven't seen any ill effects yet.
> Though, I haven't tried rebuilding the userland w/ -mno-fix-r1 just yet.
> 
> If you want, you can take a look at some of the additional info in the
> corresponding Gentoo bug that tracks PR61538:
> 
> https://bugs.gentoo.org/show_bug.cgi?id=516548
> 
> I h

RE: SC: New MIPS maintainers needed

2014-07-29 Thread Matthew Fortune
Jeff Law  writes:
> On 07/22/14 06:56, Richard Sandiford wrote:
> > I'll need to step down as MIPS maintainer this weekend in order to
> avoid
> > a possible conflict of interest with a new job.  SC: please could you
> > appoint some new maintainers to take over?
> We'll get the process started.
> 
> Stepping down doesn't require you do do anything further than removing
> yourself as the listed maintainer for the MIPS port

Hi Richard,

Although I've only worked with you for a relatively short time I'd like
to thank you for all the help and guidance you have given. I'm sure the
MIPS community owes much to your stewardship over the years and I'm
certain that it has been appreciated.

Jeff:

Now that Richard has stood down as the MIPS maintainer can you
advise on the process for getting patches approved? Should we cc
any particular global maintainer or just post and allow someone to get
to it when they can?

Thanks,
Matthew


RE: SC: New MIPS maintainers needed

2014-07-30 Thread Matthew Fortune
Eric Christopher  writes:
> On Tue, Jul 29, 2014 at 5:58 AM, Matthew Fortune
>  wrote:
> > Jeff Law  writes:
> >> On 07/22/14 06:56, Richard Sandiford wrote:
> >> > I'll need to step down as MIPS maintainer this weekend in order to
> >> avoid
> >> > a possible conflict of interest with a new job.  SC: please could you
> >> > appoint some new maintainers to take over?
> >> We'll get the process started.
> >>
> >> Stepping down doesn't require you do do anything further than removing
> >> yourself as the listed maintainer for the MIPS port
> >
> > Hi Richard,
> >
> > Although I've only worked with you for a relatively short time I'd like
> > to thank you for all the help and guidance you have given. I'm sure the
> > MIPS community owes much to your stewardship over the years and I'm
> > certain that it has been appreciated.
> >
> > Jeff:
> >
> > Now that Richard has stood down as the MIPS maintainer can you
> > advise on the process for getting patches approved? Should we cc
> > any particular global maintainer or just post and allow someone to get
> > to it when they can?
> >
> 
> You can technically cc me on it, I've not been able to do a lot of
> work with gcc until recently, but I am now reading the list etc. Patch
> review will take a bit longer than Richard, he's very hard to replace.

My apologies, I knew you were a named MIPS maintainer but didn't know you
were able to work on the codebase again.
 
> I'm also down with you and Catherine being nominated for
> maintainership, you've both been doing a lot of work in the area and
> it makes some sense.

Thanks,
Matthew


Using modes on parallel in vec_select

2014-09-11 Thread Matthew Fortune
We are currently working on the implementation of MSA (SIMD) for MIPS
and are implementing vector interleave instructions which have a
combination of vec_select and vec_concat operators in their patterns.
The selectors for the vec_select operators depend on the vector mode
so to avoid writing multiple patterns we are using this kind of
structure:

(define_insn "msa_ilvev_"
  [(set (match_operand:IMSA 0 "register_operand" "=f")
(vec_select:IMSA (vec_concat:
(match_operand:IMSA 1 "register_operand" "f")
(match_operand:IMSA 2 "register_operand" "f"))
 (match_operand:IMSA 3 "vec_par_const_ev" "")))]

Operand 3 is a parallel which we are requiring has the same mode as
the vector. This allows the predicate to check for the appropriate
sequence of element selectors based on the mode.

The question is whether it is acceptable to require a mode on the
parallel that forms the element selector?

I.e. Will this requirement prevent any of the standard optimisation
passes (such as combine) from speculatively matching this pattern?

The mode can obviously just be moved into part of the predicate name
and have more predicates but would the current approach cause any
problems?

Thanks,
Matthew


RE: MIPS Maintainers

2014-09-28 Thread Matthew Fortune
Thanks to all.

Matthew

> -Original Message-
> From: Eric Christopher [mailto:echri...@gmail.com]
> Sent: 26 September 2014 21:07
> To: Jeff Law
> Cc: Moore, Catherine; Matthew Fortune; GCC
> Subject: Re: MIPS Maintainers
> 
> Congratulations guys!
> 
> -eric
> 
> On Fri, Sep 26, 2014 at 1:01 PM, Jeff Law  wrote:
> >
> > Sorry this has taken so long, the delays have been totally mine in not
> > following-up to get votes, then tally them from the steering committee.
> >
> > I'm pleased to announce that Catherine Moore and Matthew Fortune have been
> > appointed as maintainers for the MIPS port.
> >
> > Catherine & Matthew, please update the MAINTAINERS file appropriately.
> >
> > Thanks for everyone's patience,
> > Jeff


Vector modes and the corresponding width integer mode

2014-12-11 Thread Matthew Fortune
Hi,

I'm working on MIPS SIMD support for MSA. Can anyone point me towards
information about the need for an integer mode of equal size to any
supported vector mode?

I.e. if I support V4SImode is there any core GCC requirement that
TImode is also supported?

Any guidance is appreciated. The MIPS port already has limited support
for TImode for 64-bit targets which makes it all the more difficult to
figure out if there is a relationship between vector modes and integer
modes.

Thanks,
Matthew


RE: Vector modes and the corresponding width integer mode

2014-12-12 Thread Matthew Fortune
Hi Bingfeng,

Thanks for commenting. It's reassuring to know that at least some ports
do not have the corresponding integer modes. I have now also understood
some of the background to the extra integer modes in ARM NEON and as
far as I can tell the integer modes represent an opaque view of the
registers as required by the ABI but not for anything relating to
vectorization. I tried to follow some of the X86 code too but it's a bit
too complex to dip into and understand much. I hadn't got to PowerPC
yet either.

Anyway I've ripped out the TImode handling from our current MSA
implementation and it did not immediately blow up.

Thanks,
Matthew

> -Original Message-
> From: Bingfeng Mei [mailto:b...@broadcom.com]
> Sent: 12 December 2014 11:52
> To: Matthew Fortune; gcc@gcc.gnu.org
> Subject: RE: Vector modes and the corresponding width integer mode
> 
> I don't think it is required. For example, PowerPC port supports
> V8SImode, but I don't see OImode. Just sometimes it could come handy to
> have the equal size scalar mode.
> 
> Cheers,
> Bingfeng
> 
> > -Original Message-
> > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
> > Of Matthew Fortune
> > Sent: 11 December 2014 13:27
> > To: gcc@gcc.gnu.org
> > Subject: Vector modes and the corresponding width integer mode
> >
> > Hi,
> >
> > I'm working on MIPS SIMD support for MSA. Can anyone point me towards
> > information about the need for an integer mode of equal size to any
> > supported vector mode?
> >
> > I.e. if I support V4SImode is there any core GCC requirement that
> > TImode is also supported?
> >
> > Any guidance is appreciated. The MIPS port already has limited support
> > for TImode for 64-bit targets which makes it all the more difficult to
> > figure out if there is a relationship between vector modes and integer
> > modes.
> >
> > Thanks,
> > Matthew


RE: Localized write permission for OS maintainers

2014-12-18 Thread Matthew Fortune
> Does this cover OS specific areas in the gcc/config.gcc file?  For
> example:
> 
> https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01214.html

I can’t answer authoritatively but I would expect it does include
OS specific areas of config files. However, as an arch port maintainer
I would be wary of an OS maintainer tweaking something that is
arch specific OS config. Hope that makes sense.

Thanks,
Matthew



RE: Support for architectures without hardware interlocks

2015-01-08 Thread Matthew Fortune
> On 1/8/2015 9:01 AM, Eric Botcazou wrote:
> >> I've worked on a gcc target that was porting an architecture without
> >> hardware interlock support. Basically, you need to emit nop
> >> operations to avoid possible hw conflicts. At the moment, this was
> >> done by patching the gcc scheduler to do so, Another issue to keep is
> >> to check for hardware conflicts across basic-block boundaries. And
> >> not the last, is to prohibit/avoid any instruction stream
> >> modification after scheduler (e.g., peephole optimizations etc.).
> > That's an overly complex approach, this usually can be done in a
> > simpler way with a machine-specific pass that runs at the end of the
> RTL pipeline.
> >
> Isn't this similar to needing to fill a delay slot after a branch
> instruction? My recollection is that some SPARC and MIPS have to deal
> with that.

MIPS has two forms of dealing with hazards. Where the DS filler has not
filled a DS then the branch output patterns have a print modifier that 
is printed as a nop. For other hazards (which are more like a lack of
hardware interlocks) then there is a MIPS specific reorg pass that looks
for hazards in the instruction stream and emits an appropriate amount
of NOPs. See mips_avoid_hazard and related code to see roughly how it
works.

Matthew

> 
> --
> Joel Sherrill, Ph.D. Director of Research & Development
> joel.sherr...@oarcorp.comOn-Line Applications Research
> Ask me about RTEMS: a free RTOS  Huntsville AL 35805
> Support Available(256) 722-9985



RE: Cross compiling and multiple sysroot question

2015-01-12 Thread Matthew Fortune
> On Mon, 12 Jan 2015, Steve Ellcey wrote:
> 
> > MULTILIB_OSDIRNAMES += mips32r2=mipsr2/lib MULTILIB_OSDIRNAMES +=
> > .=mipsr2/lib
> >
> > I don't think the first one would work because -mips32r2 is the
> > default architecture and is not explicitly listed in MULTILIB_OPTIONS
> > and I don't think the second form is supported at all, but maybe there
> > is some other way to specify the location of the default libraries?
> 
> The default libraries for x86_64-linux-gnu go in ../lib64, so it's
> clearly possible by doing something sufficiently similar to how it's
> done for x86_64.

MIPS does this too for mips64-linux-gnu as it has n32 for the default
multilib which gets placed in lib32. I don't honestly know how the multilib
spec doesn't end up building 4 multilibs though. I'm assuming the fact
that the default ABI is added to the DRIVER_SELF_SPECS may be the reason.

t-linux64:
MULTILIB_OPTIONS = mabi=n32/mabi=32/mabi=64
MULTILIB_DIRNAMES = n32 32 64
MIPS_EL = $(if $(filter %el, $(firstword $(subst -, ,$(target,el)
MIPS_SOFT = $(if $(strip $(filter MASK_SOFT_FLOAT_ABI, $(target_cpu_default)) 
$(filter soft, $(with_float))),soft)
MULTILIB_OSDIRNAMES = \
../lib32$(call 
if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
../lib$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) \
../lib64$(call 
if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))

Matthew


RE: limiting call clobbered registers for library functions

2015-01-30 Thread Matthew Fortune
Yury Gribov  writes:
> On 01/29/2015 08:32 PM, Richard Henderson wrote:
> > On 01/29/2015 02:08 AM, Paul Shortis wrote:
> >> I've ported GCC to a small 16 bit CPU that has single bit shifts. So
> >> I've handled variable / multi-bit shifts using a mix of inline shifts
> >> and calls to assembler support functions.
> >>
> >> The calls to the asm library functions clobber only one (by const) or
> >> two
> >> (variable) registers but of course calling these functions causes all
> >> of the standard call clobbered registers to be considered clobbered,
> >> thus wasting lots of candidate registers for use in expressions
> >> surrounding these shifts and causing unnecessary register saves in
> the surrounding function prologue/epilogue.
> >>
> >> I've scrutinized and cloned the actions of other ports that do the
> >> same, however I'm unable to convince the various passes that only r1
> >> and r2 can be clobbered by these library calls.
> >>
> >> Is anyone able to point me in the proper direction for a solution to
> >> this problem ?
> >
> > You wind up writing a pattern that contains a call, but isn't
> > represented in rtl as a call.
> 
> Could it be useful to provide a pragma for specifying function register
> usage? This would allow e.g. library writer to write a hand-optimized
> assembly version and then inform compiler of it's binary interface.
> 
> Currently a surrogate of this can be achieved by putting inline asm code
> in static inline functions in public library headers but this has it's
> own disadvantages (e.g. code bloat).

This sounds like a good idea in principle. I seem to recall seeing something
similar to this in other compiler frameworks that allow a number of special
calling conventions to be defined and enable functions to be attributed to use
one of them. I.e. not quite so general as specifying an arbitrary clobber list
but some sensible pre-defined alternative conventions.

Thanks,
Matthew


RE: unfused fma question

2015-02-22 Thread Matthew Fortune
Steve Ellcey  writes:
> Or one could change convert_mult_to_fma to add a check if fma is fused
> vs. non-fused in addition to the check for the flag_fp_contract_mode
> in order to decide whether to convert expressions into an fma and then
> define fma instructions in the md file.

I was about to say that I see no reason to change how non-fused multiply
adds work i.e. leave them to pattern matching but I think your point was
that when both fused and non-fused patterns are available then what
should we do. 

> I was wondering if anyone had an opinion about the advantages or
> disadvantages of these two approaches.

I expect that fused multiply adds are almost always faster in hardware
owing to the lack of rounding so using them eagerly, when fp-contract
allows, may still be best even if non-fused alternatives are available.
That depends on the relative cost of the two alternatives though.

Matthew


RE: wrong mirror on GCC mirror sites page

2015-03-09 Thread Matthew Fortune
Conrad S  writes:
> On 9 March 2015 at 23:08, Jonathan Wakely  wrote:
> >> How did this get into the mirror list?
> >
> > Because they said they would provide mirrors:
> > https://gcc.gnu.org/ml/gcc/2014-06/msg00251.html
> > https://gcc.gnu.org/ml/gcc/2014-07/msg00156.html
> 
> Upon closer inspection there's actually more junk in the mirror list
> site:
> 
> Australia: http://mirrors-au.go-parts.com/gcc
> Russia: http://mirrors-ru.go-parts.com/gcc
> UK: http://mirrors-uk.go-parts.com/gcc/
> US: http://mirrors-usa.go-parts.com/gcc

The last three here appear to work. I think you just got unlucky that the
mirrors-au one is broken at the moment.

Matthew


RE: IRA preferencing issues

2015-04-17 Thread Matthew Fortune
Wilco Dijkstra  writes:
> While investigating why the IRA preferencing algorithm often chooses
> incorrect preferences from the costs, I noticed this thread:
> https://gcc.gnu.org/ml/gcc/2011-05/msg00186.html
> 
> I am seeing the exact same issue on AArch64 - during the final
> preference selection ira-costs takes the union of any register classes
> that happen to have equal cost. As a result many registers get ALL_REGS
> as the preferred register eventhough its cost is much higher than either
> GENERAL_REGS or FP_REGS. So we end up with lots of scalar SIMD
> instructions and expensive int<->FP moves in integer code when register
> pressure is high. When the preference is computed correctly as in the
> proposed patch (choosing the first class with lowest cost, ie.
> GENERAL_REGS) the resulting code is much more efficient, and there are
> no spurious SIMD instructions.
> 
> Choosing a preferred class when it doesn't have the lowest cost is
> clearly incorrect. So is there a good reason why the proposed patch
> should not be applied? I actually wonder why we'd ever need to do a
> union - if there are 2 classes with equal cost, you'd use the 2nd as the
> alternative class.
> 
> The other question I had is whether there is a good way to get improve
> the preference in cases like this and avoid classes with equal cost
> altogether. The costs are clearly not equal: scalar SIMD instructions
> have higher latency and require extra int<->FP moves. It is possible to
> mark variants in the MD patterns using '?' to discourage them but that
> seems like a hack, just like '*'. Is there a general way to say that
> GENERAL_REGS is preferred over FP_REGS for SI/DI mode?

MIPS has the same problem here and we have been looking at ways to address
it purely via costings rather than changing IRA. What we have done so
far is to make the cost of a move from GENERAL_REGS to FP_REGS more
expensive than memory if the move has an integer mode. The goal for MIPS
is to never allocate an FP register to an integer mode unless it was
absolutely necessary owing to an integer to fp conversion where the
integer has to be put in an FP register. Ideally I'd like a guarantee
that FP registers will never be used unless a floating point type is
present in the source but I haven't found a way to do that given the
FP-int conversion issue requiring SImode to be allowed in FP regs.

The patch for MIPS is not submitted yet but has eliminated the final
two uses of FP registers when building the whole Linux kernel with
hard-float enabled. I am however still not confident enough to say
you can build integer only code with hard-float and never touch an FP
register.

Since there are multiple architectures suffering from this I guess we
should look at properly addressing it in generic code.

Thanks,
Matthew


[RFC] Further LRA subreg handling issues

2017-01-16 Thread Matthew Fortune
Hi Vladimir,

I'm working on PR target/78660 which is looking like a latent LRA bug.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78660

I believe the problem is in the same area as a bug was fixed in 2015:

https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02165.html

Eric pointed out that the new issue relates to something reload
specifically dealt with in reload1.c:eliminate_regs_1:

  if (MEM_P (new_rtx)
  && ((x_size < new_size
   /* On RISC machines, combine can create rtl of the form
  (set (subreg:m1 (reg:m2 R) 0) ...)
  where m1 < m2, and expects something interesting to
  happen to the entire word.  Moreover, it will use the
  (reg:m2 R) later, expecting all bits to be preserved.
  So if the number of words is the same, preserve the
  subreg so that push_reload can see it.  */
   && !(WORD_REGISTER_OPERATIONS
&& (x_size - 1) / UNITS_PER_WORD
   == (new_size -1 ) / UNITS_PER_WORD))
  || x_size == new_size)
  )
return adjust_address_nv (new_rtx, GET_MODE (x), SUBREG_BYTE (x));
  else
return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));

However the code in lra-constraints.c:curr_insn_transform does not appear
to make any attempt to handle a special case for WORD_REGISTER_OPERATIONS.
I tried the following patch to account for this, which 'works' but I'm not
at all sure what the conditions should be (the comment from reload will
need adapting and including as well):

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 260591a..ac8d116 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -4086,7 +4086,9 @@ curr_insn_transform (bool check_only_p)
  && (goal_alt[i] == NO_REGS
  || (simplify_subreg_regno
  (ira_class_hard_regs[goal_alt[i]][0],
-  GET_MODE (reg), byte, mode) >= 0)
+  GET_MODE (reg), byte, mode) >= 0)))
+ || (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (reg))
+ && WORD_REGISTER_OPERATIONS)))
{
  if (type == OP_OUT)
type = OP_INOUT;

I think at the very least the issue Richard pointed out in the previous
fix must be dealt with as the new testcase triggers exactly what he
described I believe

Richard Sandiford wrote:
> So IMO the patch is too broad.  I think it should only use INOUT reloads
> for !strict_low if the inner mode is wider than a word and the outer mode
> is strictly narrower than the inner mode.  That's on top of Vlad's
> comment about only converting OP_OUTs, of course.

And here is my attempt at dealing with that:

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index ac8d116..8a0f40f 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -4090,7 +4090,17 @@ curr_insn_transform (bool check_only_p)
  || (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (reg))
  && WORD_REGISTER_OPERATIONS)))
{
- if (type == OP_OUT)
+ /* An OP_INOUT is required when reloading a subreg of a
+mode wider than a word to ensure that data beyond the
+word being reloaded is preserved.  Also automatically
+ensure that strict_low_part reloads are made into
+OP_INOUT which should already be true from the backend
+constraints.  */
+ if (type == OP_OUT
+ && (curr_static_id->operand[i].strict_low
+ || (GET_MODE_SIZE (GET_MODE (reg)) > UNITS_PER_WORD
+ && GET_MODE_SIZE (mode)
+< GET_MODE_SIZE (GET_MODE (reg)
type = OP_INOUT;
  loc = &SUBREG_REG (*loc);
  mode = GET_MODE (*loc);

Any thoughts on whether this is along the right track would be appreciated.

Thanks,
Matthew


RE: [RFC] Further LRA subreg handling issues

2017-01-19 Thread Matthew Fortune
Vladimir Makarov  writes:
> On 01/16/2017 10:47 AM, Matthew Fortune wrote:
> > Hi Vladimir,
> >
> > I'm working on PR target/78660 which is looking like a latent LRA bug.
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78660
> >
> > I believe the problem is in the same area as a bug was fixed in 2015:
> >
> > https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02165.html
> >
> > Eric pointed out that the new issue relates to something reload
> > specifically dealt with in reload1.c:eliminate_regs_1:
> >
> >   if (MEM_P (new_rtx)
> >   && ((x_size < new_size
> >/* On RISC machines, combine can create rtl of the form
> >   (set (subreg:m1 (reg:m2 R) 0) ...)
> >   where m1 < m2, and expects something interesting to
> >   happen to the entire word.  Moreover, it will use the
> >   (reg:m2 R) later, expecting all bits to be preserved.
> >   So if the number of words is the same, preserve the
> >   subreg so that push_reload can see it.  */
> >&& !(WORD_REGISTER_OPERATIONS
> > && (x_size - 1) / UNITS_PER_WORD
> >== (new_size -1 ) / UNITS_PER_WORD))
> >   || x_size == new_size)
> >   )
> > return adjust_address_nv (new_rtx, GET_MODE (x), SUBREG_BYTE (x));
> >   else
> > return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
> >
> > However the code in lra-constraints.c:curr_insn_transform does not appear
> > to make any attempt to handle a special case for WORD_REGISTER_OPERATIONS.
> > I tried the following patch to account for this, which 'works' but I'm not
> > at all sure what the conditions should be (the comment from reload will
> > need adapting and including as well):
> >
> > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> > index 260591a..ac8d116 100644
> > --- a/gcc/lra-constraints.c
> > +++ b/gcc/lra-constraints.c
> > @@ -4086,7 +4086,9 @@ curr_insn_transform (bool check_only_p)
> >   && (goal_alt[i] == NO_REGS
> >   || (simplify_subreg_regno
> >   (ira_class_hard_regs[goal_alt[i]][0],
> > -  GET_MODE (reg), byte, mode) >= 0)
> > +  GET_MODE (reg), byte, mode) >= 0)))
> > + || (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (reg))
> > + && WORD_REGISTER_OPERATIONS)))
> > {
> >   if (type == OP_OUT)
> > type = OP_INOUT;
> >
> > I think at the very least the issue Richard pointed out in the previous
> > fix must be dealt with as the new testcase triggers exactly what he
> > described I believe
> >
> > Richard Sandiford wrote:
> >> So IMO the patch is too broad.  I think it should only use INOUT reloads
> >> for !strict_low if the inner mode is wider than a word and the outer mode
> >> is strictly narrower than the inner mode.  That's on top of Vlad's
> >> comment about only converting OP_OUTs, of course.
> > And here is my attempt at dealing with that:
> >
> > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> > index ac8d116..8a0f40f 100644
> > --- a/gcc/lra-constraints.c
> > +++ b/gcc/lra-constraints.c
> > @@ -4090,7 +4090,17 @@ curr_insn_transform (bool check_only_p)
> >   || (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (reg))
> >   && WORD_REGISTER_OPERATIONS)))
> > {
> > - if (type == OP_OUT)
> > + /* An OP_INOUT is required when reloading a subreg of a
> > +mode wider than a word to ensure that data beyond the
> > +word being reloaded is preserved.  Also automatically
> > +ensure that strict_low_part reloads are made into
> > +OP_INOUT which should already be true from the backend
> > +constraints.  */
> > + if (type == OP_OUT
> > + && (curr_static_id->operand[i].strict_low
> > + || (GET_MODE_SIZE (GET_MODE (reg)) > UNITS_PER_WORD
> > + && GET_MODE_SIZE (mode)
> > +< GET_MODE_SIZE (GET_MODE (reg)
> > type = OP_INOUT;
> > 

RE: [RFC] Further LRA subreg handling issues

2017-01-25 Thread Matthew Fortune
Eric Botcazou  writes:
> > I'll run testing for at least x86_64, MIPS and another
> > WORD_REGISTER_OPERATIONS target and try to get this committed in the
> > next couple of days so it can get into everyone's testing well before
> release.
> 
> No issues found on SPARC.

Thanks Eric.

I'm still bootstrap testing this on MIPS. It's taking longer as although
the fix allows the bootstrap to continue there are now stage2/stage3
differences. The bug appears to be essentially the same issue as I just
fixed but in reverse i.e. a paradoxical sub-reg loading too much data
from its stack slot. Extracts from dumps are below; first of all the
input rtl to the combine pass:

(insn 248 246 249 38 (set (reg:QI 282)
(subreg:QI (reg:SI 300) 0)) "/home/mfortune/gcc/gcc/predict.c":2904 362 
{*movqi_internal}
 (nil))
(insn 249 248 250 38 (set (reg:DI 284)
(zero_extend:DI (reg:QI 282))) "/home/mfortune/gcc/gcc/predict.c":2904 
216 {*zero_extendqidi2}
 (expr_list:REG_DEAD (reg:QI 282)
(nil)))
(insn 250 249 251 38 (set (reg:DI 6 $6)
(reg:DI 284)) "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
 (expr_list:REG_DEAD (reg:DI 284)
(nil)))

Trying 248 -> 249:
Successfully matched this instruction:
(set (reg:DI 284)
(subreg:DI (reg:SI 300) 0))
allowing combination of insns 248 and 249
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 248.
modifying insn i3   249: r284:DI=r300:SI#0
deferring rescan insn with uid = 249.

Trying 249 -> 250:
Successfully matched this instruction:
(set (reg:DI 6 $6)
(subreg:DI (reg:SI 300) 0))
allowing combination of insns 249 and 250
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 249.
modifying insn i3   250: $6:DI=r300:SI#0
deferring rescan insn with uid = 250.

Which results in the following coming out of combine:

(note 248 246 249 38 NOTE_INSN_DELETED)
(note 249 248 250 38 NOTE_INSN_DELETED)
(insn 250 249 251 38 (set (reg:DI 6 $6)
(subreg:DI (reg:SI 300) 0)) "/home/mfortune/gcc/gcc/predict.c":2904 310 
{*movdi_64bit}
 (nil))

Pseudo 300 is assigned to memory and then LRA produces a simple DImode load from
the assigned stack slot. The only instruction to set pseudo 300 is:

(insn 247 212 389 3 (set (reg:SI 300)
(ne:SI (subreg/s/u:SI (reg/v:DI 231 [ taken ]) 0)
(const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904 504 
{*sne_zero_sisi}
 (nil))

Which leads to an SImode store to the stack slot:

(insn 247 392 393 3 (set (reg:SI 4 $4 [300])
(ne:SI (reg:SI 20 $20 [orig:231 taken ] [231])
(const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904 504 
{*sne_zero_sisi}
 (nil))
(insn 393 247 389 3 (set (mem/c:SI (plus:DI (reg/f:DI 29 $sp)
(const_int 16 [0x10])) [403 %sfp+16 S4 A64])
(reg:SI 4 $4 [300])) "/home/mfortune/gcc/gcc/predict.c":2904 312 
{*movsi_internal}
 (nil))
...

(note 248 246 249 40 NOTE_INSN_DELETED)
(note 249 248 256 40 NOTE_INSN_DELETED)
(note 256 249 250 40 NOTE_INSN_DELETED)
(insn 250 256 251 40 (set (reg:DI 6 $6)
(mem/c:DI (plus:DI (reg/f:DI 29 $sp)
(const_int 16 [0x10])) [403 %sfp+16 S8 A64])) 
"/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
 (nil))

My assumption is that LRA is again expected to deal with this case and for insn
250 should be recognising that it must load 32-bits and rely on implicit
LOAD_EXTEND_OP behaviour producing an acceptable 64-bit value. In this case it
does not matter whether it is sign or zero extension and my assumption is that
this construct would never appear if a specific sign or zero extension was
required.

I haven't got to looking at where the issue is this time but it seems different 
as
this is a subreg in a simple move instruction where we already support the load/
store directly so no new reload instruction is required. I don't know if this
implies that simple move patterns should reject subregs but that doesn't sound
right either.

Resolving this fixes at least one bug and potentially all bugs in the MIPS 
bootstrap
as  manually modified the generated assembly code to use LW instead of LD for 
insn
250 and one of the buggy stage 3 objects is fixed.

I'll keep thinking, any advice in the meantime is appreciated.

Thanks,
Matthew


RE: [RFC] Further LRA subreg handling issues

2017-01-26 Thread Matthew Fortune
Matthew Fortune  writes:
...
> Pseudo 300 is assigned to memory and then LRA produces a simple DImode
> load from the assigned stack slot. The only instruction to set pseudo
> 300 is:
> 
> (insn 247 212 389 3 (set (reg:SI 300)
> (ne:SI (subreg/s/u:SI (reg/v:DI 231 [ taken ]) 0)
> (const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904
> 504 {*sne_zero_sisi}
>  (nil))
> 
> Which leads to an SImode store to the stack slot:
> 
> (insn 247 392 393 3 (set (reg:SI 4 $4 [300])
> (ne:SI (reg:SI 20 $20 [orig:231 taken ] [231])
> (const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904
> 504 {*sne_zero_sisi}
>  (nil))
> (insn 393 247 389 3 (set (mem/c:SI (plus:DI (reg/f:DI 29 $sp)
> (const_int 16 [0x10])) [403 %sfp+16 S4 A64])
> (reg:SI 4 $4 [300])) "/home/mfortune/gcc/gcc/predict.c":2904 312
> {*movsi_internal}
>  (nil))
> ...
> 
> (note 248 246 249 40 NOTE_INSN_DELETED)
> (note 249 248 256 40 NOTE_INSN_DELETED)
> (note 256 249 250 40 NOTE_INSN_DELETED)
> (insn 250 256 251 40 (set (reg:DI 6 $6)
> (mem/c:DI (plus:DI (reg/f:DI 29 $sp)
> (const_int 16 [0x10])) [403 %sfp+16 S8 A64]))
> "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
>  (nil))
> 
> My assumption is that LRA is again expected to deal with this case and
> for insn
> 250 should be recognising that it must load 32-bits and rely on implicit
> LOAD_EXTEND_OP behaviour producing an acceptable 64-bit value. In this
> case it does not matter whether it is sign or zero extension and my
> assumption is that this construct would never appear if a specific sign
> or zero extension was required.
> 
> I haven't got to looking at where the issue is this time but it seems
> different as this is a subreg in a simple move instruction where we
> already support the load/ store directly so no new reload instruction is
> required. I don't know if this implies that simple move patterns should
> reject subregs but that doesn't sound right either.
> 
> Resolving this fixes at least one bug and potentially all bugs in the
> MIPS bootstrap as  manually modified the generated assembly code to use
> LW instead of LD for insn
> 250 and one of the buggy stage 3 objects is fixed.
> 
> I'll keep thinking, any advice in the meantime is appreciated.

All I have been able to determine on this is that there is potentially
different behaviour for paradoxical subregs in LRA vs reload.  There is
this comment in reload.c:push_reload:

If we have (SUBREG:M1 (MEM:M2 ...) ...) (or an inner REG that is still
 a pseudo and hence will become a MEM) with M1 wider than M2 and the
 register is a pseudo, also reload the inside expression.

To me this makes perfect sense as I believe the RTL is only saying that
there is an M2-mode object to access or at least only the M2-mode sized
bits are valid. There are comments to say there will always be sufficient
memory assigned for spill slots as they are sized to fit the largest
paradoxical subreg, I just don't know why that is useful/important.

However in lra-constraints.c:simplify_operand_subreg it quite happily
performs a reload using the outer mode in this case and only drops down to
the inner mode if the outer mode reload would be slower than the inner.

Presumably this is safe for non WORD_REGISTER_OPERATIONS targets as the
junk upper bits in registers will be ignored; On WORD_REGISTER_OPERATIONS
targets then the narrower-than-word mode load will take care of any
'magic' needed to set the upper bits to a safe value in register.

So my thinking is that at least WORD_REGISTER_OPERATIONS targets should
always reload the inner mode for the case mentioned above much like the same
is required for normal subregs. Does that seem reasonable? Have I
misunderstood the paradoxical subreg case entirely?

I've only done superficial testing of a change to this code so far but my
testcase starts working at least which is a start.

Thanks,
Matthew






RE: [RFC] Further LRA subreg handling issues

2017-01-26 Thread Matthew Fortune
Eric Botcazou  writes:
> > However in lra-constraints.c:simplify_operand_subreg it quite happily
> > performs a reload using the outer mode in this case and only drops
> > down to the inner mode if the outer mode reload would be slower than
> the inner.
> >
> > Presumably this is safe for non WORD_REGISTER_OPERATIONS targets as
> > the junk upper bits in registers will be ignored; On
> > WORD_REGISTER_OPERATIONS targets then the narrower-than-word mode load
> > will take care of any 'magic' needed to set the upper bits to a safe
> value in register.
> 
> Yes, I was leaning to the same conclusion before reading your second
> message.
> 
> > So my thinking is that at least WORD_REGISTER_OPERATIONS targets
> > should always reload the inner mode for the case mentioned above much
> > like the same is required for normal subregs. Does that seem
> > reasonable? Have I misunderstood the paradoxical subreg case entirely?
> 
> No, this is correct, see find_reloads:
> 
> /* We must force a reload of paradoxical SUBREGs
>of a MEM because the alignment of the inner value
>may not be enough to do the outer reference.  On
>big-endian machines, it may also reference outside
>the object.
> 
>On machines that extend byte operations and we have a
>SUBREG where both the inner and outer modes are no wider
>than a word and the inner mode is narrower, is integral,
>and gets extended when loaded from memory, combine.c has
>made assumptions about the behavior of the machine in such
>register access.  If the data is, in fact, in memory we
>must always load using the size assumed to be in the
>register and let the insn do the different-sized
>accesses.

This part suggests to me that LRA should never be reloading the
paradoxical subreg meaning the whole SLOW_UNALIGNED_ACCESS checking code in
simplify_operand_subreg could be removed unconditionally.  But I get the
feeling the big valid_address_p check (below) will still prevent some
paradoxical subregs from being reloaded via their inner mode.  I haven't
quite understood exactly what the check is trying to achieve yet though:

  if (!addr_was_valid
  || valid_address_p (GET_MODE (subst), XEXP (subst, 0),
  MEM_ADDR_SPACE (subst))
  || ((get_constraint_type (lookup_constraint
(curr_static_id->operand[nop].constraint))
   != CT_SPECIAL_MEMORY)
  /* We still can reload address and if the address is
 valid, we can remove subreg without reloading its
 inner memory.  */
  && valid_address_p (GET_MODE (subst),
  regno_reg_rtx
  [ira_class_hard_regs
   [base_reg_class (GET_MODE (subst),
MEM_ADDR_SPACE (subst),
ADDRESS, SCRATCH)][0]],
  MEM_ADDR_SPACE (subst
{

>This is doubly true if WORD_REGISTER_OPERATIONS.  In
>this case eliminate_regs has left non-paradoxical
>subregs for push_reload to see.  Make sure it does
>by forcing the reload.

This statement covers the fix I already proposed but perhaps
simplify_operand_subreg can also hit this issue if a 'normal' subreg appears
in an instruction where registers and memory are supported (like move
instructions). In this case the constraints are satisfied and the fix I
proposed would never get run but simplify_operand_subreg would.

Eric: I see you recently had to modify the code I'm talking about in the post
below. Out of interest... was this another issue brought to light by the
improvements to zero extension elimination?

https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01202.html

Matthew


[RFD] Simplifying subregs in LRA

2017-02-01 Thread Matthew Fortune
Hi all,

I've copied you as you have each made some significant change to a function
in LRA which I guess makes you de-facto experts.

I've spent a while researching the history of simplify_operand_subreg and
in particular the behaviour for subregs of memory.  For my sake if no-one 
else here is a rundown of its evolution; corrections welcome.

(r192719 git:c6a6cda)
The original code identified a few special cases where a subreg could
be trivially eliminated.  Otherwise it introduced a reload for the inner
expression in expectation of the new subreg(reload-reg) to be handled
as part of operand reloading or get eliminated on the next iteration

(r198344 git:ea99c7a)
A special case for an LRA introduced subreg was added (LRA_SUBREG_P) that
should always be considered valid.  This I believe is to cope with cases
where a there are operands required to match but with different modes and,
presumably, one of the modes is not actually allowed.  Not 100% sure what
this is though!

(r203169 git:c533414)
A special case for a paradoxical subreg (wider than a word) was added to
handle the case where the outer mode requires more registers than are
available in the allocno class.  There is a seemingly pointless call to
PUT_MODE in this which from what I can see will set the same mode as
the reload register was just created with: "PUT_MODE (new_reg, mode);"

(r207007 git:9c8190e)
A special case to ensure a subreg(pseudo-reg) will get spilled to avoid
looping in LRA.

It now starts to get interesting.

(r211715 git:1a68e83)
Only simplify a subreg of memory if the new address is valid or that the
original address wasn't valid. 

(r220297 git:1aae95e)
Introduces 'innermode' local which as far as I can tell is going to be
identical to the reg_mode argument to simplify_operand_subreg.  References
to reg_mode are not fixed as part of this.  This is a significant source
of confusion in the code.
Also simplifies subreg of CONSTANT_P expressions.

(r233107 git:401bd0c)
Also allow elimination of a subreg of memory when the address component
of the new MEM can be reloaded.  Note that at this point we are specifically
allowing new MEMs to exist with invalid addresses that will be fixed up
later but this is OK as they will get reloaded.

(r239342 git:2d2b78a)
Introduces MEM subreg simplification if the inner mode access would have
been slow anyway (a check that gets fixed in d7671d7). Also introduces a
double reload to satisfy some very specific rs6000 backend requirements.
This case requires an outer-mode reload but is implemented as a double
reload rather than just doing the outer mode reload and letting the next
iteration take care of the following step. It is not clear whether this
change actually fixes the same reported issues in two different ways due
to the change on the condition for simplifying the MEM subreg and the
double reload trick. 

This change also affects almost all MEM subreg reloads as previously
any MEM subreg that was not simplified would have used the common innermode
reload logic later in the function; see code guarded by:

   || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg))

but now many, if not all, use the double reload logic which is not always
necessary. I believe this change needs partly reverting and redoing as a
special case to insert an outermode reload for a MEM subreg if the goal
class for the operand has no register suitable for inner mode, then wait for
the following iteration to deal with the MEM subreg simplification. (Just a
theory for now.)

(r242554 git:d7671d7)
Fixes the new case of MEM subreg simplification from (r239342 git:2d2b78a)

(r243782 git:856bd6f)
This is another case of multiple changes where some were not critical and
overall there is a dangerous one here I believe.  The primary aim of this
change is to reload the address before reloading the inner subreg.  This
appears to be a dormant bug since day1 as the original logic would have
failed when reloading an inner mem if its address was not already valid.

The potentially fatal part of this change is the introduction of a
"return false" in the MEM subreg simplification which is placed immediately
after restoring the original subreg expression.  I believe that if control
ever actually reaches this statement then LRA would infinite loop as the
MEM subreg would never be simplified.  With the additional cases handled
by virtue of (401bd0c) then I believe there are very few reasons why control
would reach this point but presumably a MEM subreg on an operand with
a special memory constraint is one such case that would be fatal.

So what are the next steps!

1) [BUG] Add an exclusion for WORD_REGISTER_OPERATIONS because MIPS is currently
   broken by the existing code. PR78660
2) [BUG] Remove the return false introduced in (r243782 git:856bd6f).
3) [CLEANUP] Remove reg_mode argument and replace all uses of reg_mode with
   innermode.  Rename 'reg' to 'inner' and 'operand' to 'outer' and 'mode' to
   'outermode'.
4) [OPTIM

RE: [RFD] Simplifying subregs in LRA

2017-02-03 Thread Matthew Fortune
Eric Botcazou  writes:
> > (r243782 git:856bd6f)
> > This is another case of multiple changes where some were not critical
> > and overall there is a dangerous one here I believe.  The primary aim
> > of this change is to reload the address before reloading the inner
> > subreg.  This appears to be a dormant bug since day1 as the original
> > logic would have failed when reloading an inner mem if its address was
> not already valid.
> >
> > The potentially fatal part of this change is the introduction of a
> > "return false" in the MEM subreg simplification which is placed
> > immediately after restoring the original subreg expression.  I believe
> > that if control ever actually reaches this statement then LRA would
> > infinite loop as the MEM subreg would never be simplified.
> 
> How can a change that is a no-op be fatal exactly?

It's not a no-op. Any MEM_P not handled by the first "if (MEM_P(reg))"
will have previously been handled by the block guarded by the following
later in the function; note the "|| MEM_P (reg)":

  /* Force a reload of the SUBREG_REG if this is a constant or PLUS or
 if there may be a problem accessing OPERAND in the outer
 mode.  */
  if ((REG_P (reg)
   && REGNO (reg) >= FIRST_PSEUDO_REGISTER
   && (hard_regno = lra_get_regno_hard_regno (REGNO (reg))) >= 0
   /* Don't reload paradoxical subregs because we could be looping
  having repeatedly final regno out of hard regs range.  */
   && (hard_regno_nregs[hard_regno][innermode]
   >= hard_regno_nregs[hard_regno][mode])
   && simplify_subreg_regno (hard_regno, innermode,
 SUBREG_BYTE (operand), mode) < 0
   /* Don't reload subreg for matching reload.  It is actually
  valid subreg in LRA.  */
   && ! LRA_SUBREG_P (operand))
  || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg))
{

> > So what are the next steps!
> >
> > 1) [BUG] Add an exclusion for WORD_REGISTER_OPERATIONS because MIPS is
> > currently broken by the existing code. PR78660
> 
> That seems the way to go, with the appropriate check on the mode sizes.

I'm not sure what check to do on mode sizes. Do you think an innermode
reload is only required when both modes have the same number of words?

> > 2) [BUG] Remove the return false introduced in (r243782 git:856bd6f).
> 
> !???

If a MEM subreg is neither simplified to an outermode MEM nor reloaded
in innermode then I believe LRA will never resolve the subreg.  Even if that
is not true I'm fairly certain the addition of the code has changed
behaviour and that the change is not well understood, as explained above.

> > 3) [CLEANUP] Remove reg_mode argument and replace all uses of reg_mode
> with
> >innermode.  Rename 'reg' to 'inner' and 'operand' to 'outer' and
> 'mode'
> > to 'outermode'.
> > 4) [OPTIMISATION] Change double-reload logic so that it just deals
> with the
> >special outermode reload without adjusting the subreg.
> > 5) [??] Determine if big endian still needs a special case like in
> reload?
> >Comments anyone?
> 
> I agree that a cleanup of the code would probably be in order, with an
> eye on the reload code as a model, but that's probably not appropriate
> for GCC 7.

Indeed, definitely want to wait for GCC 8.

> > In an attempt to make a minimal change I propose the following as it
> > allows WORD_REGISTER_OPERATIONS targets to benefit from the invalid
> > address reloading fix. I think the check would be more appropriately
> > placed on the outer-most if (MEM_P (reg)) but this would affect the
> > handling of many more subregs which seems too dangerous at this point
> in release.
> >
> > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index
> > 393cc69..771475a 100644
> > --- a/gcc/lra-constraints.c
> > +++ b/gcc/lra-constraints.c
> > @@ -1512,10 +1512,11 @@ simplify_operand_subreg (int nop, machine_mode
> > reg_mode) equivalences in function lra_constraints) and because for
> > spilled pseudos we allocate stack memory enough for the biggest
> >  corresponding paradoxical subreg.  */
> > - if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > -   && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> > - || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> > - && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg
> > + if (!WORD_REGISTER_OPERATIONS
> > + && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > +   && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> > + || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> > + && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN
> (reg)
> > return true;
> >
> >   *curr_id->operand_loc[nop] = operand;
> >
> >
> > The change will affect at least arc,mips,rx,sh,sparc though I haven't
> > checked which of these default on for LRA just that they can turn on
> LRA.
> 
> Only MIPS and SPARC (see https://gcc.gnu.org/bac

RE: ICE in gcc.dg/pr77834.c test for MIPS

2017-02-23 Thread Matthew Fortune
Segher Boessenkool  writes:
> On Thu, Feb 23, 2017 at 04:27:26PM +, Toma Tabacu wrote:
> > > This happens when you have inserted code ending in a jump on an
> edge.
> > > This then will need updating of the CFG, and this code does not know
> > > how to do that.
> >
> > Would the following be an appropriate solution ?
> 
> [ snip ]
> 
> > @@ -2047,6 +2047,16 @@ commit_one_edge_insertion (edge e)
> > +  /* If the edge contains control flow instructions, remember to
> update the
> > + CFG after the insertion.  */
> 
> I don't know what all can break if you allow control flow insns in
> insert_insn_on_edge -- this function is called from many different
> passes, many things can break -- but this does look like it should work.
> 
> > +  bool update_cfg = false;
> > +  for (tmp = insns; tmp && update_cfg == false; tmp = NEXT_INSN
> (tmp))
> > + if (control_flow_insn_p (tmp))
> > +   {
> > +update_cfg = true;
> > +break;
> > +   }
> 
> > +  if (update_cfg)
> > +{
> > +  auto_sbitmap blocks (last_basic_block_for_fn (cfun));
> > +  bitmap_ones (blocks);
> > +  find_many_sub_basic_blocks (blocks);
> > +
> > +  last = BB_END (bb);
> > +}
> 
> Maybe you can keep track of what blocks to split, instead of just saying
> "all".
> 
> > In short, I'm updating the CFG by calling find_many_sub_basic_blocks
> > with an all-one block bitmap (this also happens in cfgexpand.c, after
> > the edge
> > insertions) whenever an edge contains an insn which satisfies
> control_flow_insn_p.
> 
> General...  Patches need to go to gcc-patches@.  You also should have
> your copyright assignment in order (I have no idea if you do; if you do,
> please ignore).  Finally, trunk currently is in stage 4, this work will
> need to wait for stage 1 (a couple of months, something like that).

This is an ICE that will be reproducible on a primary target so is still
appropriate to pursue in stage4 as far as I understand.  I'm hoping to
find time to work with Toma on this issue.

> Can't whatever creates those jump insns keep the cfg in shape?  That
> would avoid all issues here.

The problem, I think, is that these instructions are not yet in the cfg
and are being inserted on an edge.  The jump is produced from the inline
memcpy expansion we do for MIPS.  In some cases there will be no loop,
some cases there will be a loop ending with the conditional jump and
some cases will have a loop and other instructions after the conditional
jump. The 1st and 3rd form will get through the logic in
commit_one_edge_insertion (albeit that the 3rd form will have incorrect
cfg actually) but the 2nd form is rejected because of ending with a jump.

Other than coping with the potential for sub blocks here or letting them
through and leaving later code to split the blocks then I see no other
way forward.  I agree it should be possible to process just the blocks
with jump instructions in the middle and that is actually going to be
exactly one block in this case.  I don't know if updating the CFG while
edges are being iterated on in commit_edge_insertions is safe though
and am somewhat out of my comfort zone in general!

Thanks,
Matthew



RE: A problem with LRA

2017-04-18 Thread Matthew Fortune
comp  writes:
> Hi all,
> I recently have a problem with LRA.
> 1 The Bug use case
> int a=10;
> float c=2.0,d;
> main()
> {
>         float b;
>         *(int*)&b=a;
>         d=b+c;
> }
> 
> 2 The problem description
> In the pass LRA, curr_insn_transform () deal with the addition statement d = 
> b + c, the
> corresponding rtx expression in register allocation is as follows:
> (gdb) pr curr_insn
> (insn 9 8 10 2 (set (reg:SF 73 [ d ])
>         (plus:SF (reg:SF 79 [ c ])
>             (subreg:SF (reg:SI 77 [ a ]) 0))) test.c:7 121 {addsf3}
>      (expr_list:REG_DEAD (reg:SF 79 [ c ])
>         (expr_list:REG_DEAD (reg:SI 77 [ a ])
>             (nil
> The corresponding addsf3 template in the .md file is defined as follows:
> (define_insn "add3"
>   [(set (match_operand:FMODE 0 "register_operand" "=f")
>         (plus:FMODE (match_operand:FMODE 1 "reg_or_0_operand" "%fG")
>                     (match_operand:FMODE 2 "reg_or_0_operand" "fG")))]
>   "TARGET_FP"
>   "fadd%/ %R1,%R2,%0"
>   [(set_attr "type" "fadd"))
> 
> curr_insn_transform() calls process_alt_operands() for matching constraints, 
> the matching
> of operands 0, 1, and 2 are all successful, where the main matching processes 
> of the
> second operand, i.e.(subreg: SF (reg: SI 77 [a]) 0) are as follows:
> op = no_subreg_reg_operand[nop], where nop=2;
> Here get op: (reg:SI 77 [ a ])
> mode = curr_operand_mode[nop];
> Here get mode: SFmode
> cl = reg_class_for_constraint (cn)
> Here get cn: CONSTRAINT_f£¬and cl:FLOAT_REGS
> 
> FLOAT_REGS is defined as the ability to allocate all hard registers in the
> REG_CLASS_CONTENTS macro that describes the processor¡¯s registers in the 
> machine
> description. so that the matching key function in_hard_reg_set_p 
> (this_alternative_set,
> mode, hard_regno [nop]) returns true, where psudo reg 77 is assigned $1 hard 
> reg in the
> pass IRA.i.e. hard_regno[nop]=1. The hardware register $1 belongs to 
> FLOAT_REGS and also
> belongs to GENERAL_REGS, but it was derived from the integer a, so the before 
> matched
> instruction that generated $1 as the destination register is an integer kind 
> load
> instruction ldw. Thus the d = b + c statement generates the instruction:
> fadds $ 48, $ 1, $ 48, where c is assigned to $48, b is assigned to $1, and 
> the result d
> lies in $48. The result is the following instructions:
> ldw $1,a($1)
> flds $48,c($2)
> fadds $48,$1,$48
> The problem lies in the second source operand of the floating-point addition 
> fadds
> instruction , $48 is obtained by floating-point load instruction flds, but $1 
> is obtained
> by the integer load instruction ldw, so the result is wrong, we hope that the
> process_alt_operands() results a match failure, and a reload may generate 
> that turns ldw
> to flds instruction.

Isn't this a CANNOT_CHANGE_MODE_CLASS issue then? I.e. don't allow it to be 
reinterpreted
from integer mode to floating point mode.

> 3 The comparative test
> In contrast, if the $1 in the REG_CLASS_CONTENTS register category is defined 
> as not
> belonging to FLOAT_REGS, the above process_alt_operands () returns false when 
> the second
> operand is matched(in_hard_reg_set_p (this_alternative_set, mode, hard_regno 
> [nop])
> returns fail), and so a reload is triggered, an ifmovs instruction will 
> generate to move
> the contents of the integer register to the floating point register. the 
> following
> instructions is correct:
> ldw $1,a($1)
> flds $f11,c($2)
> ifmovs $1,$f10
> fadds $f11,$f10,$f11

This is not consistent with the previous example code as the floating point 
registers now
have 'f' prefixes but did not before.

You will need to explain more about what registers are available and how they 
can be used
(or not as the case may be).

If this issue ends up anywhere near subreg handling inside LRA then please keep 
me on CC. I
think this is likely to just be a backend issue from the description so far 
though.

Thanks,
Matthew


RE: Overwhelmed by GCC frustration

2017-08-01 Thread Matthew Fortune
Richard Biener  writes:
> On Mon, Jul 31, 2017 at 7:08 PM, Andrew Haley  wrote:
> > On 31/07/17 17:12, Oleg Endo wrote:
> >> On Mon, 2017-07-31 at 15:25 +0200, Georg-Johann Lay wrote:
> >>> Around 2010, someone who used a code snipped that I published in a
> >>> wiki, reported that the code didn't work and hang in an endless
> >>> loop.  Soon I found out that it was due to some GCC problem, and I
> >>> got interested in fixing the compiler so that it worked with my
> >>> code.
> >>>
> >>> 1 1/2 years later, in 2011, [...]
> >>
> >> I could probably write a similar rant.  This is the life of a
> >> "minority target programmer".  Most development efforts are being
> >> done with primary targets in mind.  And as a result, most changes are
> >> being tested only on such targets.
> >>
> >> To improve the situation, we'd need a lot more target specific tests
> >> which test for those regressions that you have mentioned.  Then of
> >> course somebody has to run all those tests on all those various
> >> targets.  I think that's the biggest problem.  But still, with a test
> >> case at hand, it's much easier to talk to people who have silently
> >> introduced a regression on some "other" targets.  Most of the time
> >> they just don't know.
> >
> > It's a fundamental problem for compilers, in general: every
> > optimization pass wants to be the last one, and (almost?) no-one who
> > writes a pass knows all the details of all the subsequent passes.  The
> > more sophisticated and subtle an optimization, the more possibility
> > there is of messing something up or confusing someone's back end or a
> > later pass.  We've seen this multiple times, with apparently
> > straightforward control flow at the source level turning into a mess
> > of spaghetti in the resulting assembly.  But we know that the
> > optimization makes sense for some kinds of program, or at least that
> > it did at the time the optimization was written.  However, it is
> > inevitable that some programs will be made worse by some
> > optimizations.  We hope that they will be few in number, but it really
> > can't be helped.
> >
> > So what is to be done?  We could abandon the eternal drive for more
> > and more optimizations, back off, and concentrate on simplicity and
> > robustness at the expens of ultimate code quality.  Should we?  It
> > would take courage, and there will be an eternal pressume to improve
> > code.  And, of course, we'd risk someone forking GCC and creating the
> > "superoptimized GCC" project, starving FSF GCC of developers.  That's
> > happened before, so it's not an imaginary risk.
> 
> Heh.  I suspect -Os would benefit from a separate compilation pipeline
> such as -Og.  Nowadays the early optimization pipeline is what you want
> (mostly simple CSE & jump optimizations, focused on code size
> improvements).  That doesn't get you any loop optimizations but loop
> optimizations always have the chance to increase code size or register
> pressure.
> 
> But yes, targeting an architecture like AVR which is neither primary nor
> secondary (so very low priority) _plus_ being quite special in target
> abilities (it seems to be very easy to mess up things) is hard.
> 
> SUSE does have some testers doing (also) code size monitoring but as
> much data we have somebody needs to monitor it, further bisect and
> report regressions deemed worthwhile.  It's hard to avoid slow creep --
> compile-time and memory use are a similar issue here.

Towards the end of last year we ran a code size analysis over time for
MIPS GCC (I believe microMIPSR3 to be specific) between Oct 2013 and
Aug 2016 taking every 50th commit if memory serves. I have a whole bunch
of graphs for open source benchmarks that I may be able to share. The
net effect was a significant code size reduction with just a few short
(<2months) regressions. Not all benchmarks ended up at the best ever code
size and some regressions were countered by different optimisations than
the ones that introduced the regression (so the issue wasn't strictly
fixed in all cases). Over this period I would therefore be surprised if
GCC has caused significant code size regressions in general. I don't have
the detailed analysis to hand but a significant code size reduction
happened ~Mar/Apr 2014 but I can't remember why that was. I do remember a
spike when changing to LRA but that settled down (mostly).

Matthew


RE: Redundant sign-extension instructions on RISC-V

2017-08-30 Thread Matthew Fortune
Jeff Law  writes:
> On 08/30/2017 06:52 AM, Richard Biener wrote:
> > On Wed, Aug 30, 2017 at 11:53 AM, Michael Clark  
> > wrote:
> >>
> >>> On 30 Aug 2017, at 9:43 PM, Michael Clark  wrote:
> >>>
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index ce632ae..25dd70f 100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -1503,6 +1503,10 @@ simplify_unary_operation_1 (enum rtx_code code, 
> > machine_mode
> mode, rtx op)
> > /* (sign_extend:M (lshiftrt:N  (const_int I))) is better as
> >(zero_extend:M (lshiftrt:N  (const_int I))) if I is not 0.  */
> > if (GET_CODE (op) == LSHIFTRT
> > +#if defined(POINTERS_EXTEND_UNSIGNED)
> > +  /* we skip this optimisation if pointers naturally extend signed 
> > */
> > + && POINTERS_EXTEND_UNSIGNED
> > +#endif
> >&& CONST_INT_P (XEXP (op, 1))
> >&& XEXP (op, 1) != const0_rtx)
> >  return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
> 
>  Is it just me or does this miss a || mode != Pmode || GET_MODE (op) != 
>  ptr_mode
>  check?  Note the comment says exactly the opposite as the transform...
> 
>  I’m not even sure why this simplification is correct in the first place?!
> >>>
> >>> I hope you are not confusing my use of POINTERS_EXTEND_UNSIGNED as a 
> >>> proxy for the
> property that defines whether sub width operations sign-extend to the full 
> width of the
> register vs zero extend. Are you taking about our added comment?
> >
> > I'm talking about using POINTERS_EXTEND_UNSIGNED for sth that looks
> > unrelated (and that has no easy way to be queried as you noted).
> Right.  I was going to make the same observation.  I can't see how
> POINTER_EXTEND_UNSIGNED plays a significant role here.
> 
> MIPS has similar properties and my recollection is they did some
> interesting tricks in the target files to fold the extension back into
> the arithmetic insns (beyond the usual LOAD_EXTEND_OP,
> WORD_REGISTER_OPERATIONS, TRULY_NOOP_TRUNCATION, and PROMOTE_MODE stuff).

If there is a condition to add I would have expected it to be based around
WORD_REGISTER_OPERATIONS etc too.

> My recollection was they defined their key insns with match_operators
> that allowed the sign extension to occur in the arithmetic insns.  BUt I
> don't see any evidence of that anymore.  But I can distinctly remember
> discussing it with Ian and Meissner eons ago and its impact on reload in
> particular.

I see that riscv has chosen to not allow ior/and/xor with SImode as named
patterns but instead just for combine to pick up. Given that the
architecture has almost all the same properties as MIPS I don't follow why
the SImode version is not allowed at expand time. MIPS relies on all SImode
values being in a canonical sign extended form at all points and can
therefore freely represent the dual (or rather no) mode operations, such as
comparisons and logical operations, as both SI and DI mode. This pretty much
solves the redundant sign extension issues. Just because the logical
operations only exist as '64-bit' operations in the 64-bit architecture does
not mean you can't tell GCC there are 32-bit versions as well; you just have
to present a logical view of the architecture rather than being overly
strict. LLVM for MIPS went through similar issues and I suspect RISCV will
hit the same kind of issues but the same solution was used and both 32bit
and 64-bit operations described with the same underlying instruction.

Is there an architectural difference that means riscv can't do the same
thing?

Matthew


RE: [RFC v2] MIPS ABI Extension for IEEE Std 754 Non-Compliant Interlinking

2017-10-18 Thread Matthew Fortune
Hi Maciej,

Slow thread, sorry for dragging it out further...

Maciej Rozycki  writes:
> On Fri, 11 Nov 2016, Matthew Fortune wrote:
> 
> > This means that a user consciously creating an object that 'needs'
> ieee
> > compliance via use of -fieee=strict or -mieee=strict is thwarted by
> the
> > next user who builds the executable. This kind of scenario can occur
> with
> > a static library prepared by an expert in floating point and then
> someone
> > casually including that into a bigger application. Obviously a similar
> > issue is present with the rules around executable and shared libraries
> > where the executable's compliance mode can override a shared library
> > but at this level we are not losing any information and the executable
> > has either very specifically been set to 'relaxed' mode or the kernel
> > has set legacy to mean relaxed. The latter can at least be fixed by
> > changing the kernel. Losing information in a static link cannot be
> > fixed.
> 
>  I think I can see your point and I admit I may have oversimplified the
> model, losing a piece of crucial information and consequently control.
> 
>  What I can propose is a changed model which requires three states at
> compilation/assembly, and then four states at link/load time
> automatically
> determined by the input objects, with a possible influence of linker
> command-line options to prevent certain transitions.  These are (names
> up
> to discussion):
> 
> 1. Strict -- known to require the NaN encodings to match,
> 
> 2. Unknown -- may or may not require the NaN encodings to match,
> 
> 3. Unneeded -- known not to require the NaN encodings to match

Am I right in thinking unneeded is not the same as no-float, it has
floating point in it but explicitly does not care about NaN encodings
because it was built with an option that said so?

> -- at compilation/assembly and:
> 
> A. Strict -- enforcing matching NaN encodings -- built from strict,
>unknown and unneeded objects of the matching NaN encoding,

Must have at least one strict object?

> 
> B. Unknown -- matching the NaN encodings, but not enforcing it -- built
>from unknown and unneeded objects of the matching NaN encoding,
> 
> C. Unneeded -- not requiring the NaN encodings to match -- built from
> only
>unneeded objects of the matching NaN encoding,
> 
> D. Relaxed -- known not to match either NaN encoding -- built from
> unknown
>and unneeded objects of which at least one does not match the NaN
>encoding of the remaining objects, or from at least one relaxed
> object.
> 
> -- at link/load time.  Any other object combinations would result in a
> link/load failure, e.g. you could not mix A with a D object, or any
> object
> not matching the NaN encoding.
> 
>  The difference between B and C is at the run time -- the treatment of B
> is controlled by the "ieee754=" kernel option, whereas C always ignores
> NaN compatibility of the hardware.  The difference between C and D is at
> the link/load time -- C can be upgraded to A or B, but D is inherently
> lost and remains at D.  At the ELF binary level B objects correspond to
> what I previously referred to as legacy objects, i.e. no extra
> annotation
> beyond the EF_MIPS_NAN2008 bit.  There could be a linker command-line
> option to prevent a transition from B to D from happening if not
> desired,
> causing a link failure.
> 
>  The states would be maintained at run-time, when a DSO is dlopen(3)ed.
> A would accept A, B or C if matching the NaN encoding, and stay at A.  B
> would accept B or C if matching the NaN encoding, and stay at B.  With
> the
> relaxed kernel configuration B would also accept B or C using the
> opposite
> NaN encoding or D, and switch to D.  C would accept C if matching the
> NaN
> encoding, and stay at C.  C would accept B if matching the NaN encoding,
> and switch to B.  C would accept B or C using the opposite NaN encoding
> or
> D, and switch to D.  Any other combinations would cause a dlopen(3)
> failure.

I'm not entirely sure why 3 or C should care about nan encoding at all
because they should be totally independent of the concept. I.e. link
any combination, choose a NaN encoding for the resulting object at
random but disregard the NaN encoding coming from an unneeded object
when combining with others.

>  In this model only the initial state is determined by the main
> executable
> and further transitions are possible as dynamic objects are added,
> making
> the use of prctl(3) to switch states more prominent.  One unfortunate
> consequence is that dlopen(3)ing an A DSO from a B or C executable
> switches its state to A permanently maki

Register conflicts between output memory address and output register

2018-04-17 Thread Matthew Fortune
Hi,

I've been investigating some quirks of register allocation when handling
some inline asm. The behaviour is non-intuitive but I am not sure if it
is a bug or not. This is back on GCC 6 so I'm still reviewing to see if
anything changed in this area since then.

The inline asm in question is:

int bar;
int my_mem;

void foo()
{
  asm volatile ("%0, %1, %2" : "=m"(my_mem), "=r"(bar)
 : "m"(my_mem)
 : "memory");
}

What I see is that if the address of my_mem is lowered prior to IRA then
there is a pseudo register for the output memory address (and perhaps an
offset) and a pseudo register for the output register. These two registers
are seen as a conflict in IRA so get different registers allocated. This is
good.

If however the address of my_mem is lowered after IRA i.e. when validating
constraints in LRA then IRA has nothing to do as the address is just a
symbol_ref. When LRA resolves the constraint for the address it introduces
a register for the output memory address but does not seem to acknowledge
any conflict with the output register (bar) it can therefore end up
using the same register for the output memory address as the output register.
This leads to the obvious problem if the ASM updates %1 before %0 as it will
corrupt the address.

This can of course be worked around by making (bar) an early clobber or an
in/out but this does seem unnecessary.

The question is... Should LRA recognise a conflict between the registers
involved in the address portion of an output memory operand and any output
register operands or is this a case where you strictly have to use early
clobber.

Any advice welcome!

Thanks,
Matthew


RE: MIPS maintainership

2018-04-27 Thread Matthew Fortune
Hi Catherine,

Thank-you for all the advice and guidance while we have been co-maintaining
the MIPS backend; it's been a pleasure.

Thanks,
Matthew


From: Moore, Catherine [mailto:catherine_mo...@mentor.com] 
Sent: 25 April 2018 22:52
To: gcc@gcc.gnu.org
Cc: Matthew Fortune
Subject: MIPS maintainership

Hi all,

I need to resign as maintainer for the MIPS port.  My work commitments have 
taken me in a different direction and as a result I haven't been able to 
actively participate over the last year.  I don't see that changing anytime 
soon.  I hope that someone with the interest and the time is available and will 
volunteer.

Thanks,
Catherine


RE: Introducing a nanoMIPS port for GCC

2018-05-02 Thread Matthew Fortune
Robert Suchanek  writes:
> the last 18 months.  This announcement has a general introduction at 
> the start, so if you have already read it for one of the other tools, 
> you can skip down to the information specific to GCC.

Thanks, Robert.

Corresponding technical info for other toolchain components can be
found in the following archived posts.

binutils/gdb/gold
=
http://sourceware.org/ml/binutils/2018-05/msg3.html

qemu

http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg00081.html

Thanks,
Matthew


RE: Introducing a nanoMIPS port for GCC

2018-05-02 Thread Matthew Fortune
Joseph Myers  writes:
> On Wed, 2 May 2018, Matthew Fortune wrote:
> 
> > qemu
> > 
> > http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg00081.html
> 
> That answers one thing I was wondering, by saying you're using the generic
> Linux kernel syscall interface rather than any of the existing MIPS
> syscall interfaces.
> 
> Is your Linux kernel port available somewhere (or a description of it
> corresponding to these descriptions of changes to toolchain components)?

Hi Joseph,

The kernel is being prepared for a branch in the linux-mips.org repository
and a document adding to the wiki there. All being well it will not take
too long to get that available.

To my knowledge the major areas of the nanoMIPS kernel that are not yet
finalised are the debug interfaces but our kernel engineers will be able
to give a more detailed description.

Matthew


RE: RISC-V ELF multilibs

2018-05-31 Thread Matthew Fortune
Palmer Dabbelt  writes:
> On Tue, 29 May 2018 11:02:58 PDT (-0700), Jim Wilson wrote:
> > On 05/26/2018 06:04 AM, Sebastian Huber wrote:
> >> Why is the default multilib and a variant identical?
> >
> > This is supposed to be a single multilib, with two names.  We use
> > MULTILIB_REUSE to map the two names to a single multilib.
> >
> > rohan:1030$ ./xgcc -B./ -march=rv64imafdc -mabi=lp64d --print-libgcc
> > ./rv64imafdc/lp64d/libgcc.a
> > rohan:1031$ ./xgcc -B./ -march=rv64gc -mabi=lp64d --print-libgcc
> > ./rv64imafdc/lp64d/libgcc.a
> > rohan:1032$ ./xgcc -B./ --print-libgcc
> > ./libgcc.a
> > rohan:1033$
> >
> > So this is working right when the -march option is given, but not
> when
> > no -march is given.  I'd suggest a bug report so I can track this, if
> > you haven't already filed one.
> 
> IIRC this is actually a limit of the GCC build system: there needs to
> be some
> default multilib, and it has to be unprefixed.  I wanted to keep the
> library
> paths orthogonal (ie, not bake in a default that rv64gc/lp64d lives at
> /lib),
> so I chose to just build a redundant multilib.
> 
> It'd be great to get rid of this, but I'm afraid it's way past my level
> of
> understanding as to how all this works.

I do actually have a solution for this but it is not submitted upstream.
MIPS has basically the same set of problems that RISC-V does in this area
and in an ideal world there would be no 'fallback' multilib such that if
you use compiler options that map to a library variant that does not
exist then the linker just fails to find any libraries at all rather than
using the default multilib.

I can share the raw patch for this and try to give you some idea about how
it works. I am struggling to find time to do much open source support at
the moment so may not be able to do all the due diligence to get it
committed. Would you be willing to take a look and do some of the work to
get it in tree?

Matthew


RE: Testing a plugin optimization with 'make check' (slashes in options)

2013-05-01 Thread Matthew Fortune
Hi,

> I was wondering if someone would tell me how to pass an option that
> contains slashes into 'make check'?
> 
> For example if I want to test a compiler using a simulator and the -O3 option
> I can run:
> 
> make check RUNTESTFLAGS="--target_board='mips-sim-mti32/-O3'"
> 
> I want to run this:
> 
> make check RUNTESTFLAGS="--target_board='mips-sim-mti32/-
> fplugin=/home/sellcey/plugin/dynopt.so'"

Using a brace expansion may help here though I haven't checked. Forward slashes 
are for the first set of run variants and braces are for the second set. Given 
the separator for brace expansion is , then the forwards slashes would 
hopefully not matter. It's a bit of a guess though:

make check 
RUNTESTFLAGS="--target_board='mips-sim-mti32\{-fplugin=/home/sellcey/plugin/dynopt.so\}'"

Matthew

> 
> But the slashes in the plugin path are messing things up (I tried putting 1, 
> 2,
> or 3 backslashes in front of the forward slashes in the path but that did not
> help and I tried putting /home/sellcey/plugin/dynopt.so in single quotes,
> that did not help either.
> 
> Steve Ellcey
> sell...@imgtec.com




mips16 LRA vs reload - Excess reload registers

2013-08-23 Thread Matthew Fortune
Hi Vladimir,

I've been working on code size improvements for mips16 and have been pleased to 
see some improvement when switching to use LRA instead of classic reload. At 
the same time though I have also seen some differences between reload and LRA 
in terms of how efficiently reload registers are reused.

The trigger for LRA to underperform compared with classic reload is when IRA 
allocates inappropriate registers and thus puts a lot of stress on reloading. 
Mips16 showed this because it can only access a small subset of the MIPS 
registers for general instructions. The remaining MIPS registers are still 
available as they can be accessed by some special instructions and used via 
move instructions as temporaries. In the current mips16 backend, register move 
costings lead IRA to determine that although the preferred class for most 
pseudos is M16_REGS, the allocno class ends up as GR_REGS. IRA then resorts to 
allocating registers outside of M16_REGS more and more as register pressure 
increases, even though this is fairly stupid. 

When using classic reload the inappropriate register allocations are 
effectively reverted as the reload pseudos that get invented tend to all 
converge on the same hard register completely removing the original pseudo. For 
LRA the reloads tend to diverge and different hard registers are assigned to 
the reload pseudos leaving us with two new pseudos and the original. Two extra 
move instructions and two extra hard registers used. While I'm not saying it is 
LRA's fault for not fixing this situation perfectly it does seem that classic 
reload is better at it.

I have found a potential solution to the original IRA register allocation 
problem but I think there may still be something to address in LRA to improve 
this scenario anyway. My proposed solution to the IRA problem for mips16 is to 
adjust register move costings such that the total of moving between M16_REGS 
and GR_REGS and back is more expensive than memory, but moving from GR_REGS to 
GR_REGS is cheaper than memory (even though this is a bit weird as you have to 
go through an M16_REG to move from one GR_REG to another GR_REG).

GR_REGS to GR_REGS has to be cheaper than memory as it needs to be a candidate 
pressure class but the additional cost for M16->GR->M16 means that IRA does not 
use GR_REGS as an alternative class and the allocno class is just M16_REGS as 
desired. This feels a bit like a hack but may be the best solution. The hard 
register costings used when allocating registers from an allocno class just 
don't seem to be strong enough to prevent poor register allocation in this 
case, I don't know if the hard register costs are supposed to resolve this 
issue or if they are just about fine tuning.

With the fix in place, LRA outperforms classic reload which is fantastic!

I have a small(ish) test case for this and dumps for IRA, LRA and classic 
reload along with the patch to enable LRA for mips16. I can also provide the 
fix to register costing that effectively avoids/hides this problem for mips16. 
Should I post them here or put them in a bugzilla ticket?

Any advice on which area needs fixing would be welcome and I am quite happy to 
work on this given some direction. I suspect these issues are relevant for any 
architecture that is not 100% orthogonal which is pretty much all and 
particularly important for compressed instruction sets.

Regards,
Matthew

--
Matthew Fortune
Leading Software Design Engineer, MIPS processor IP
Imagination Technologies Limited
t: +44 (0)113 242 9814
www.imgtec.com




RE: mips16 LRA vs reload - Excess reload registers

2013-09-09 Thread Matthew Fortune


> -Original Message-
> From: Vladimir Makarov [mailto:vmaka...@redhat.com]
> Sent: 08 September 2013 17:51
> To: Matthew Fortune
> Cc: gcc@gcc.gnu.org; ber...@codesourcery.com
> Subject: Re: mips16 LRA vs reload - Excess reload registers
> 
> On 13-08-23 5:26 AM, Matthew Fortune wrote:
> > Hi Vladimir,
> >
> > I've been working on code size improvements for mips16 and have been
> pleased to see some improvement when switching to use LRA instead of
> classic reload. At the same time though I have also seen some differences
> between reload and LRA in terms of how efficiently reload registers are
> reused.
> >
> > The trigger for LRA to underperform compared with classic reload is when
> IRA allocates inappropriate registers and thus puts a lot of stress on
> reloading. Mips16 showed this because it can only access a small subset of
> the MIPS registers for general instructions. The remaining MIPS registers are
> still available as they can be accessed by some special instructions and used
> via move instructions as temporaries. In the current mips16 backend,
> register move costings lead IRA to determine that although the preferred
> class for most pseudos is M16_REGS, the allocno class ends up as GR_REGS.
> IRA then resorts to allocating registers outside of M16_REGS more and more
> as register pressure increases, even though this is fairly stupid.
> >
> > When using classic reload the inappropriate register allocations are
> effectively reverted as the reload pseudos that get invented tend to all
> converge on the same hard register completely removing the original
> pseudo. For LRA the reloads tend to diverge and different hard registers are
> assigned to the reload pseudos leaving us with two new pseudos and the
> original. Two extra move instructions and two extra hard registers used.
> While I'm not saying it is LRA's fault for not fixing this situation 
> perfectly it
> does seem that classic reload is better at it.
> >
> > I have found a potential solution to the original IRA register allocation
> problem but I think there may still be something to address in LRA to
> improve this scenario anyway. My proposed solution to the IRA problem for
> mips16 is to adjust register move costings such that the total of moving
> between M16_REGS and GR_REGS and back is more expensive than memory,
> but moving from GR_REGS to GR_REGS is cheaper than memory (even
> though this is a bit weird as you have to go through an M16_REG to move
> from one GR_REG to another GR_REG).
> >
> > GR_REGS to GR_REGS has to be cheaper than memory as it needs to be a
> candidate pressure class but the additional cost for M16->GR->M16 means
> that IRA does not use GR_REGS as an alternative class and the allocno class is
> just M16_REGS as desired. This feels a bit like a hack but may be the best
> solution. The hard register costings used when allocating registers from an
> allocno class just don't seem to be strong enough to prevent poor register
> allocation in this case, I don't know if the hard register costs are supposed 
> to
> resolve this issue or if they are just about fine tuning.
> >
> > With the fix in place, LRA outperforms classic reload which is fantastic!
> >
> > I have a small(ish) test case for this and dumps for IRA, LRA and classic
> reload along with the patch to enable LRA for mips16. I can also provide the
> fix to register costing that effectively avoids/hides this problem for mips16.
> Should I post them here or put them in a bugzilla ticket?
> >
> > Any advice on which area needs fixing would be welcome and I am quite
> happy to work on this given some direction. I suspect these issues are
> relevant for any architecture that is not 100% orthogonal which is pretty
> much all and particularly important for compressed instruction sets.
> >
> Sorry again than I did not find time to answer you earlier, Matt.
> 
> Your hack could work.  And I guess it is always worth to post the patch for
> public with examples of the generated code before and after the patch.
> May be some collective mind helps to figure out more what to do with the
> patch.

I'll post that shortly.
 
> But I guess there is still a thing to do. After constraining allocation only 
> to
> MIPS16 regs we still could use non-MIPS16 GR_REGS for storing values of
> less frequently used pseudos (as storing them in non-MIPS16 GR_REGS is
> better than in memory).  E.g. x86-64 LRA can use SSE regs for storing values
> of less frequently used pseudos requiring GENERAL_REGS.
> Please look at spill_class target hook and its implementation for x86-64.

I have indeed implemented that for mips16 and found that not only doe

RE: mips16 LRA vs reload - Excess reload registers

2013-09-18 Thread Matthew Fortune
> > My original post was trying to point out an instance where LRA is not
> performing as well as reload. Although I can avoid this for mips16 it may well
> occur in other circumstances but not be as noticeable. Is this something
> worth pursuing?
> >
> Yes, it is worth pursuing.  Whatever reload does to improve code of IRA, it
> can be better done by global register allocator as it sees all picture not 
> just a
> local context.
> 
> Besides right hard reg move cost value problem, finding reg class for pseudos
> (in IRA or in the old RA) has some pitfalls which can be generally fixed only 
> by
> early choosing insn alternatives before RA.  For example, I know that a
> problem with better use of ARM neon registers could be fixed by this.  But it
> is a bit different story about early code selection.

I have posted a test case and further details in bug 58461. It includes my 
proposed fix to register allocation for mips16 as well but that is a secondary 
issue.

Regards,
Matthew



RE: [MIPS] Optimizing stack frames for codesize

2013-10-04 Thread Matthew Fortune
> From: Richard Sandiford [mailto:rdsandif...@googlemail.com]
> Matthew Fortune  writes:
> > I have been looking at using the FRAME_GROWS_DOWNWARD macro to
> change
> > the layout of a frame such that spill slots end up closer to the stack
> > pointer.  This is useful as it leads to more spill/reload instructions
> > being encodable with 16bit instructions for mips16 and micromips, the
> > same is likely to be true for other compressed instruction sets in
> > other archs.
> >
> > Currently FRAME_GROWS_DOWNWARD is only set when
> flag_stack_protect for
> > mips as it is a prerequisite for stack protection. I can?t think of
> > any drawback to having the frame grow downward unconditionally for
> > mips16 and micromips. I don?t believe there is any impact on
> > performance or debug regardless of the way a frame is populated. I?m
> > tempted to suggest changing this for all mips ISAs but I feel I must
> > be missing some important point.
> 
> I suppose the main potential disadvantage for MIPS16 is that it becomes
> harder to predict before register allocation whether a particular access will
> be extended or unextended.  E.g. at the moment, a frame address will look
> like:
> 
>   (mem (plus (reg $frame) (const_int X)))
> 
> and using X to decide the cost of an address will give a reasonably good
> indication of whether the final $sp-based address will be extended.
> With FRAME_GROWS_DOWNWARD it becomes:
> 
>   (mem (plus (reg $frame) (const_int -X)))
> 
> with larger X indicating smaller final offsets, and with the final offset
> changing each time the frame grows.  That includes each time a spill slot is
> added.
> 
> That isn't a problem for microMIPS (if it's a problem at all), because we 
> don't
> take the length into account while optimising.  We only do that for MIPS16.

That's an interesting point. I'll try and look into what impact that may have.

> But in reality, the only reason it's the way it is now is because
> !FRAME_GROWS_DOWNWARD is the traditional behaviour, probably from
> well before MIPS16 was added.  AFAIK no-one has measured whether
> changing it makes things worse or better.  So when -fstack-protect support
> was added, we just made the minimal change.
> 
> If you have time to benchmark it then let's go with whatever turns out best.

I have already done some investigation (only with bare metal tools so far) and 
that shows that growing the frame downward almost universally improves code 
size. This is looking at the detail of function level size comparison as well 
as overall library sizes. I have not done any performance analysis as yet.

> However, I think it'd be better to get the LRA transition done first, 
> including
> fixing the spilling of MIPS16 registers into temporary non-MIPS16 registers
> rather than the stack.  At the moment we're using stack reloads far more
> than we ought to, which I think would skew the figures a bit.

I would like to get the LRA transition in progress but I was/am under the 
impression that I'd have to do this for all ISAs/variants of MIPS at the same 
time rather than just mip16 and as such it will take time to test sufficiently.

Regards,
Matthew





addsi3_mips16 and frame pointer with LRA

2013-10-29 Thread Matthew Fortune
Hi Richard/Vladimir,

I believe I finally understand one of the issues with LRA and mips16 but I 
can't see how to solve it. Take the following instruction:

(insn 5 18 6 2 (set (reg:SI 4 $4)
    (plus:SI (reg/f:SI 78 $frame)
    (const_int 16 [0x10]))) test.c:6 13 {*addsi3_mips16}
 (nil))

$frame will be eliminated to either $sp or the hard frame pointer, which for 
mips16 is $17. The problem here is that there is no single alternative that 
accepts either $sp or $17 because the supported immediate range is different 
for $sp(ks) and $17(d). The "ks" alternative is disregarded (presumably because 
there is no way to reload into $sp if that ended up being necessary) and 
instead the "d" alternative is chosen. If the frame pointer is needed then this 
works well because $17 is used and fits the "d" constraint however when the 
frame pointer is omitted $sp has to be reloaded into a "d" register even though 
there is another alternative which it would directly match.

The fragment of the reload dump:

    1 Matching alt: reject+=2
    1 Non-pseudo reload: reject+=2
  alt=4,overall=10,losers=1,rld_nregs=1
    1 Matching alt: reject+=2
    1 Non-pseudo reload: reject+=2
  alt=5,overall=10,losers=1,rld_nregs=1
    1 Non-pseudo reload: reject+=2
    1 Non-pseudo reload: reject+=2
  alt=7,overall=8,losers=1,rld_nregs=1
    1 Non-pseudo reload: reject+=2
    2 Non-pseudo reload: reject+=2
    2 Non input pseudo reload: reject++
    alt=8,overall=17,losers=2 -- refuse
 Choosing alt 7 in insn 5:  (0) d  (1) d  (2) O {*addsi3_mips16}
  Creating newreg=198 from oldreg=78, assigning class M16_REGS to r198

And the end result is:

(insn 20 18 5 2 (set (reg/f:SI 2 $2 [198])
    (reg/f:SI 29 $sp)) test.c:6 290 {*movsi_mips16}
 (nil))
(insn 5 20 6 2 (set (reg:SI 4 $4)
    (plus:SI (reg/f:SI 2 $2 [198])
    (const_int 16 [0x10]))) test.c:6 13 {*addsi3_mips16}
 (nil))

The only way I can currently see to get any direct usage of $sp in an add 
instruction would be to artificially reduce the permitted immediate range by 1 
bit so that there is a single alternative that allows either "ks" or "d" with a 
15bit immediate. I don't really want to do that though. I initially allowed 
$frame to be treated as per $sp but that led to an ICE when $frame was 
eliminated to $17 and the immediate was out of range.

Have I missed anything that would allow me to support the full immediate range 
in all cases?

Regards,
Matthew

Matthew Fortune
Leading Software Design Engineer, MIPS processor IP
Imagination Technologies Limited
t: +44 (0)113 242 9814
www.imgtec.com




RE: addsi3_mips16 and frame pointer with LRA

2013-10-29 Thread Matthew Fortune
> On 10/29/2013 09:43 AM, Matthew Fortune wrote:
> > Hi Richard/Vladimir,
> >
> > I believe I finally understand one of the issues with LRA and mips16 but I
> can't see how to solve it. Take the following instruction:
> >
> > (insn 5 18 6 2 (set (reg:SI 4 $4)
> > (plus:SI (reg/f:SI 78 $frame)
> > (const_int 16 [0x10]))) test.c:6 13 {*addsi3_mips16}
> >  (nil))
> >
> > $frame will be eliminated to either $sp or the hard frame pointer, which
> for mips16 is $17. The problem here is that there is no single alternative 
> that
> accepts either $sp or $17 because the supported immediate range is
> different for $sp(ks) and $17(d). The "ks" alternative is disregarded
> (presumably because there is no way to reload into $sp if that ended up
> being necessary) and instead the "d" alternative is chosen. If the frame
> pointer is needed then this works well because $17 is used and fits the "d"
> constraint however when the frame pointer is omitted $sp has to be
> reloaded into a "d" register even though there is another alternative which
> it would directly match.
> >
> > The fragment of the reload dump:
> >
> > 1 Matching alt: reject+=2
> > 1 Non-pseudo reload: reject+=2
> >   alt=4,overall=10,losers=1,rld_nregs=1
> > 1 Matching alt: reject+=2
> > 1 Non-pseudo reload: reject+=2
> >   alt=5,overall=10,losers=1,rld_nregs=1
> > 1 Non-pseudo reload: reject+=2
> > 1 Non-pseudo reload: reject+=2
> >   alt=7,overall=8,losers=1,rld_nregs=1
> > 1 Non-pseudo reload: reject+=2
> > 2 Non-pseudo reload: reject+=2
> > 2 Non input pseudo reload: reject++
> > alt=8,overall=17,losers=2 -- refuse
> >  Choosing alt 7 in insn 5:  (0) d  (1) d  (2) O {*addsi3_mips16}
> >   Creating newreg=198 from oldreg=78, assigning class M16_REGS to
> > r198
> >
> > And the end result is:
> >
> > (insn 20 18 5 2 (set (reg/f:SI 2 $2 [198])
> > (reg/f:SI 29 $sp)) test.c:6 290 {*movsi_mips16}
> >  (nil))
> > (insn 5 20 6 2 (set (reg:SI 4 $4)
> > (plus:SI (reg/f:SI 2 $2 [198])
> > (const_int 16 [0x10]))) test.c:6 13 {*addsi3_mips16}
> >  (nil))
> >
> > The only way I can currently see to get any direct usage of $sp in an add
> instruction would be to artificially reduce the permitted immediate range
> by 1 bit so that there is a single alternative that allows either "ks" or "d"
> with a 15bit immediate. I don't really want to do that though. I initially
> allowed $frame to be treated as per $sp but that led to an ICE when $frame
> was eliminated to $17 and the immediate was out of range.
> >
> > Have I missed anything that would allow me to support the full immediate
> range in all cases?
> Sorry, I can not reproduce this using a test from PR58461 and patch enabling
> LRA for mips.

I'm afraid the patch in the bug report includes some incorrect workarounds 
which were my early attempt at resolving this issue.

> It is hard for me to say what is going on.  Elimination is done when we match
> hard reg against constraints.  May be elimination to hfp is rejected on some
> sub-pass and LRA don't try all alternatives after this.
> 
> Could you send me the test from which you got this dumps and what
> options did you use.  I can say more.   As I understand there should be
> no such problem.  If reload can do right think, LRA should do the same.

Attached is a reduced patch to enable LRA for mips16 with my workarounds 
omitted.

Also attached is frametest.c which should be compiled with -O2 -mips16. Add 
-mreload to compare with reload and you will see the extra instruction that LRA 
produces. When not omitting the frame pointer, LRA and reload behave the same.

Regards,
Matthew
> >



mips16lra.patch
Description: mips16lra.patch
int g(int*);
int foo()
{
  int b = 1;
  g(&b);
  return b;
}


RE: addsi3_mips16 and frame pointer with LRA

2013-11-06 Thread Matthew Fortune
> I'll do the patch tomorrow to fix it.  The patch should be not big but it will
> need a lot testing.

Thanks Vladimir. The fix appears to be working.



[MIPS] Avoiding FP operations/register usage

2014-02-07 Thread Matthew Fortune
Hi Richard,

I've been trying to determine for some time whether the MIPS backend has 
successfully guaranteed that even when compiling with hard-float enabled there 
is no floating point code emitted unless you use floating point types.

My most recent reason for looking at this is because I am starting to 
understand/look at mips ld.so from glibc and it appears to make such an 
assumption. I.e. I cannot see it using any specific options to prevent the use 
of floating point but the path into the dynamic linker for resolving symbols 
only preserves integer argument registers and ignores floating point. I have to 
therefore assume that the MIPS backend manages to avoid what I thought was a 
common problem of using floating point registers as integer scratch in extreme 
circumstances.

An another example of where this issue is relevant is the MIPS linux kernel 
which explicitly compiles for soft-float, whether this is out of caution or 
necessity I do not know but I'm interested to figure it out.

Any insight into this would be welcome. If there is no such guarantee (which is 
what I have assumed thus far) then I will go ahead fix anything that relies on 
avoiding floating point code.

Regards,
Matthew 



RE: [MIPS] Avoiding FP operations/register usage

2014-02-07 Thread Matthew Fortune
> > My most recent reason for looking at this is because I am starting to
> > understand/look at mips ld.so from glibc and it appears to make such
> > an assumption. I.e. I cannot see it using any specific options to
> > prevent the use of floating point but the path into the dynamic linker
> > for resolving symbols only preserves integer argument registers and
> > ignores floating point. I have to therefore assume that the MIPS
> > backend manages to avoid what I thought was a common problem of using
> > floating point registers as integer scratch in extreme circumstances.
> 
> Even if you avoid use of floating point (via -ffixed-* options - check 
> carefully
> that those are actually effective, as for some targets there are or have been
> initialization order issues for registers that are only conditionally 
> available,
> that may make such options ineffective - not -msoft-float, as that would
> mark the objects ABI-incompatible), you'd still need to save and restore call-
> clobbered registers used for argument passing, because IFUNC resolvers,
> audit modules and user implementations of malloc might clobber them.

This is where I was going next with this but I didn't know if it was 
appropriate to go into such things on the GCC list.

> Thus, I think ld.so needs to save and restore those registers (and so there
> isn't much point making it avoid floating point).  See
> .

Thanks for this and I agree. I've read some of the threads on this topic but 
not these. I have also realised I've stumbled my way into something that will 
also affect/be affected by how we define the ABI extension for MSA. If we 
define an ABI extension that uses MSA registers for arguments then these would 
also need saving around dynamic loader entry points.

I'm still interested in how successfully the MIPS backend is managing to avoid 
floating point but I am also convinced there are bugs in ld.so entry points for 
MIPS.

Matthew



RE: [MIPS] Avoiding FP operations/register usage

2014-02-11 Thread Matthew Fortune
> Matthew Fortune  writes:
> > I'm still interested in how successfully the MIPS backend is managing
> > to avoid floating point but I am also convinced there are bugs in
> > ld.so entry points for MIPS.
> 
> It uses the standard mechanism to avoid it, which is marking uses of FP
> registers for integer moves, loads and stores with "*".  This tells the 
> register
> allocator to ignore those alternatives.  AFAIK it is effective and I think any
> cases where it doesn't work would be fair bug reports.

I understand that '*' has no effect on whether reload/LRA will use the 
alternative though so I take that to mean they could still allocate FP regs as 
part of an integer move?

> It becomes a lot more difficult to define with things like the Loongson
> extensions though, since some of those are also useful as scalar integer
> operations.  And of course the same goes for MSA.

Indeed.

Avoiding FP registers 99.9% of time is fine for performance, it's the potential 
0.1% I'm concerned about for correctness. I'm tending towards accounting for 
potential FPU usage even from integer only source just to be safe. I don't want 
to ever be the one debugging something like ld.so in the face of this kind of 
bug.

I'll move the discussion to glibc regarding ld.so.

Regards,
Matthew



[RFC] Rationale for passing vectors by value in SIMD registers

2014-02-14 Thread Matthew Fortune
MIPS is currently evaluating the benefit of using SIMD registers to pass vector 
data by value. It is currently unclear how important it is for vector data to 
be passed in SIMD registers. I.e. the need for passing vector data by value in 
real world code is not immediately obvious. The performance advantage is 
therefore also unclear.

Can anyone offer insight in the rationale behind decision decisions made for 
other architectures ABIs? For example, the x86 and x86_64 calling convention 
for vector data types presumes that they will passed in SSE/AVX registers and 
raises warnings if passed when sse/avx support is not enabled. This is what 
MIPS is currently considering however there are two concerns:

1) What about the ability to create architecture/implementation independent 
APIs that may include vector types in the prototypes. Such APIs may be built 
for varying levels of hardware support to make the most of a specific 
architecture implementation but be called from otherwise implementation 
agnostic code. To support such a scenario we would need to use a common calling 
convention usable on all architecture variants.
2) Although vector types are not specifically covered by existing ABI 
definitions for MIPS we have unfortunately got a defacto standard for how to 
pass these by value. Vector types are simply considered to be small structures 
and passed as such following normal ABI rules. This is still a concern even 
though it is generally accepted that there is some room for change when it 
comes to vector data types in an existing ABI.

If anyone could offer a brief history the x86 ABI with respect to vector data 
types that may also be interesting. One question would be whether the use of 
vector registers in the calling convention was only enabled by default once 
there was a critical mass of implementations, and therefore the default ABI was 
changed to start making assumptions about the availability of features like SSE 
and AVX.

Comments from any other architecture that has had to make such changes over 
time would also be welcome.

Thanks in advance,
Matthew



RE: [RFC] Rationale for passing vectors by value in SIMD registers

2014-02-15 Thread Matthew Fortune
> On Fri, Feb 14, 2014 at 2:17 AM, Matthew Fortune
>  wrote:
> > MIPS is currently evaluating the benefit of using SIMD registers to pass
> vector data by value. It is currently unclear how important it is for vector 
> data
> to be passed in SIMD registers. I.e. the need for passing vector data by value
> in real world code is not immediately obvious. The performance advantage is
> therefore also unclear.
> >
> > Can anyone offer insight in the rationale behind decision decisions made
> for other architectures ABIs? For example, the x86 and x86_64 calling
> convention for vector data types presumes that they will passed in SSE/AVX
> registers and raises warnings if passed when sse/avx support is not enabled.
> This is what MIPS is currently considering however there are two concerns:
> >
> > 1) What about the ability to create architecture/implementation
> independent APIs that may include vector types in the prototypes. Such APIs
> may be built for varying levels of hardware support to make the most of a
> specific architecture implementation but be called from otherwise
> implementation agnostic code. To support such a scenario we would need to
> use a common calling convention usable on all architecture variants.
> > 2) Although vector types are not specifically covered by existing ABI
> definitions for MIPS we have unfortunately got a defacto standard for how
> to pass these by value. Vector types are simply considered to be small
> structures and passed as such following normal ABI rules. This is still a
> concern even though it is generally accepted that there is some room for
> change when it comes to vector data types in an existing ABI.
> >
> > If anyone could offer a brief history the x86 ABI with respect to vector 
> > data
> types that may also be interesting. One question would be whether the use
> of vector registers in the calling convention was only enabled by default once
> there was a critical mass of implementations, and therefore the default ABI
> was changed to start making assumptions about the availability of features
> like SSE and AVX.
> >
> > Comments from any other architecture that has had to make such changes
> over time would also be welcome.
> 
> PPC and arm and AARCH64 are common targets where vectors are
> passed/return via value.  The idea is simple, sometimes you have functions
> like vector float vsinf(vector float a) where you want to be faster and avoid 
> a
> round trip to L1 (or even L2).  These kind of functions are common for vector
> programming.  That is extending the scalar versions to the vector versions.

I suppose this cost (L1/L2) is mitigated to some extent if the base ABI were to 
pass a vector in multiple GP/FP register rather than via the stack. There would 
of course still be a cost to marshall the data between GP/FP and SIMD 
registers. For such a support routine like vsinf I would expect it also needs a 
reduced clobber set to ensure that the caller's live SIMD registers don't need 
saving/restoring, such registers would normally be caller-saved. If the routine 
were to clobber all SIMD registers anyway then the improvement in argument 
passing seems negligible.

Do you/anyone know of any open source projects, which have started adopting 
generic vector types, and show the use of this kind of construct?

Thanks for your input.

Matthew

> 
> Thanks,
> Andrew Pinski
> 
> >
> > Thanks in advance,
> > Matthew
> >


[RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-21 Thread Matthew Fortune
All,

Imagination Technologies would like to introduce the design of an O32 ABI 
extension for MIPS to allow it to be used in conjunction with MIPS FPUs having 
64-bit floating-point registers. This is a wide-reaching design that involves 
changes to all components of the MIPS toolchain it is being posted to GCC first 
and will progress on to other tools. This ABI extension is compatible with the 
existing O32 ABI definition and will not require the introduction of new build 
variants (multilibs).

The design document is relatively large and has been placed on the MIPS 
Compiler Team wiki to facilitate review:

http://dmz-portal.mips.com/wiki/MIPS_O32_ABI_-_FR0_and_FR1_Interlinking

The introductory paragraph is copied below:

---
MIPS ABIs have been adjusted many times as the architecture has evolved, 
resulting in the introduction of incompatible ABI variants. Current 
architectural changes lead us to review the state of the O32 ABI and evaluate 
whether the existing ABI can be made more compatible with current and future 
extensions.
The three primary reasons for extending the current O32 ABI are the 
introduction of the MSA ASE, the desire to exploit the FR=1 mode of current 
FPUs and the potential for future architectures to demand that floating point 
units run in the 'FR=1' mode. 
For the avoidance of doubt: 

* The FR=0 mode describes an FPU where we consider registers to be constructed 
of 32-bit parts and (depending on architecture revision) there are either 16 or 
32 single-precision registers and 16 double-precision registers. The 
double-precision registers exist at even indices and their upper half exists in 
the odd indices. 
* The FR=1 mode describes an FPU with 32 64-bit registers. All registers can be 
used for either single or double-precision data. 
* The MSA ASE requires the FR=1 mode 
---

All aspects of this design are open for discussion but, in particular, feedback 
and suggestions on the following areas are welcome:

* The mechanism in which we mark the mode requirements of binaries (ELF flags 
vs 'other')
* The mechanism in which mode requirements are conveyed from a program loader 
to a running program/dynamic linker.

Regards,
Matthew



RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-24 Thread Matthew Fortune
Richard Sandiford  writes
> Matthew Fortune  writes:
> > All,
> >
> > Imagination Technologies would like to introduce the design of an O32
> > ABI extension for MIPS to allow it to be used in conjunction with MIPS
> > FPUs having 64-bit floating-point registers. This is a wide-reaching
> > design that involves changes to all components of the MIPS toolchain
> > it is being posted to GCC first and will progress on to other tools.
> > This ABI extension is compatible with the existing O32 ABI definition
> > and will not require the introduction of new build variants
> (multilibs).
> >
> > The design document is relatively large and has been placed on the
> > MIPS Compiler Team wiki to facilitate review:
> >
> > http://dmz-portal.mips.com/wiki/MIPS_O32_ABI_-_FR0_and_FR1_Interlinkin
> > g
> 
> Looks good to me.  It'll be interesting to see whether making the odd-
> numbered call-saved-in-fr0 registers available for frx pays off or
> whether it ends up being better to avoid them.

Indeed, I suspect they should be avoided except for leaf functions. You would 
have to be pretty desperate for a register if you use the caller-and-callee 
save registers!

> I understand the need to deprecate the current -mgp32 -mfp64 behaviour.
> I don't think we should deprecate -mfp64 itself though.  Instead, why
> not keep -mfp32 as meaning FR0, -mfp64 meaning FR1 and add -mfpxx for
> modeless?  So rather than deprecating the -mgp32 -mfp64 combination and
> adding -mfr, we'd just make -mgp32 -mfp64 generate the new FR1 form in
> which the odd-numbered registers are call-clobbered rather than the old
> form in which they were call-saved.

Extreme caution is the only reason why the design avoided changing fp64 
behaviour (basically in case anyone had serious objection). If you would be 
happy with a change of behaviour for -mgp32 -mfp64 then that is a great start.
 
> AIUI the old form never really worked reliably due to things like
> newlib's setjmp not preserving the odd-numbered registers, so it doesn't
> seem worth keeping around.  Also, the old form is identified by the GNU
> attribute (4, 4) so it'd be easy for the linker to reject links between
> the old and the new form.

That is true. You will have noticed a number of changes over recent months to 
start fixing fp64 as currently defined but having found this new solution then 
such fixes are no longer important. The lack of support for gp32 fp64 in linux 
is further reason to permit redefining it. Would you be happy to retain the 
same builtin defines for FP64 if changing its behaviour (i.e. __mips_fpr=64)?
 
> The corresponding asm would then be ".set fp=xx".
> 
> Either way, a new .set option would be better than a specific .fr
> directive because it gives you access to the option stack (".set
> push"/".set pop").
> 
> I'm not sure about:
> 
>   If an assembly directive is seen prior to the start of the text
>   section then this modifies the default mode for the module.
> 
> This isn't how any of the existing options work and I think the
> inconsistency would be confusing.  It also means that if the first
> function in a file happens to force a local mode (e.g.
> because it's an ifunc implementation) then you'd have to remember to
> write:
> 
>   .fr x
>   .fr 1
> 
> so that the first sets the mode for the module and the second sets it
> for the first function.  The different treatment of the two lines
> wouldn't be obvious at first glance.
> 
> How about instead having a separate directive that explicitly sets the
> global value of an option?  I.e. something like ".module", taking the
> same options as ".set".  Better names welcome. :-)

Use of a different directive to actually affect the overall mode of a module 
sounds like a good plan and it avoids the weird behaviour. The only thing 
specifically needed is that the assembly file records the mode it was written 
for. Getting the wrong command line option would otherwise lead to unusual 
runtime failures. We have been/are still discussing this point so it's no 
surprise you have commented on it too. I'll wait for any further comments on 
this area and update accordingly.
 
> The scheme allows an ifunc to request a mode and effectively gives the
> choice to the firstcomer.  Every other ifunc then has to live with the
> choice.  I don't think that's a good idea, since the order that ifuncs
> are called isn't well-defined or under direct user control.
> 
> Since ifuncs would have to live with a choice made by other ifuncs, in
> practice they must all be prepared to deal with FR=1 if linked into a
> fully-modeless or FR1 program, and

RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-24 Thread Matthew Fortune
Richard Sandiford  writes
> >> AIUI the old form never really worked reliably due to things like
> >> newlib's setjmp not preserving the odd-numbered registers, so it
> >> doesn't seem worth keeping around.  Also, the old form is identified
> >> by the GNU attribute (4, 4) so it'd be easy for the linker to reject
> >> links between the old and the new form.
> >
> > That is true. You will have noticed a number of changes over recent
> > months to start fixing fp64 as currently defined but having found this
> > new solution then such fixes are no longer important. The lack of
> > support for gp32 fp64 in linux is further reason to permit redefining
> > it. Would you be happy to retain the same builtin defines for FP64 if
> > changing its behaviour (i.e. __mips_fpr=64)?
> 
> I think that should be OK.  I suppose a natural follow-on question is
> what __mips_fpr should be for -mfpxx.  Maybe just 0?

I'm doing just that in my experimental implementation of all this.
 
> If we want to be extra cautious we could define a second set of macros
> alongside the old ones.
> 
> >> You allow the mode to be changed midexecution if a new FR0 or FR1
> >> object is loaded.  Is it really worth supporting that though?
> >> It has the same problem as the ifuncs: once you've dlopen()ed an
> >> object, you fix the mode for the whole program, even after the
> dlclose().
> >> Unless we know of specific cases where this is needed, maybe it would
> >> be safer to fix the mode before execution based on DT_NEEDED
> >> libraries and allow the mode of modeless programs to be overridden by
> >> an environment variable.
> >
> > Scanning the entire set of DT_NEEDED libraries would achieve most of
> > what full dynamic mode switching gives us, it is essentially the first
> > stage of the dynamic mode switching described in the proposal anyway.
> > However, I am concerned about excluding dlopen()ed objects from mode
> > selection though (not so worried about excluding ifunc, that could
> > just fix the mode before resolving the first one). One specific
> > concern is for Android where I believe we have the situation where
> > native applications are loaded as (a form of) shared library. This
> > means a mode requirement can be introduced late on. In an Android
> > environment it is unlikely to be acceptable to have to do something
> > special to load an application that happens to have a specific mode
> > requirement so dynamic selection is useful. This is more of a
> > transitional problem than anything but making it a smooth process is
> > quite important. I'm also not sure that there is much more effort
> > required for a dynamic linker to take account of dlopen()ed objects in
> > addition to DT_NEEDED, changes are needed in this code regardless.
> 
> As far as GNU/Linux goes, if we do end up with a function in something
> like a modeless libm that is implemented as an FR-aware ifunc, that
> would force the choice to be made early anyway.  So we have this very
> specific case where everything in the initial process is modeless, no
> ifuncs take advantage of the FR setting, and a dlopen()ed object was
> compiled as fr0 rather than modeless.  I agree it's possible but it
> seems unlikely.

A reasonable point.
 
> I know nothing about the way Android loading works though. :-) Could you
> describe it in more detail?  Is it significantly different from glibc's
> dynamic loader running a PIE?

I am working from fragments of information on this aspect still so I need to 
get more clarification from Android developers. My current understanding is 
that native parts of applications are actually shared libraries and form part 
of, but not necessarily the entry to, an application. Since such a shared 
library can't be 'required' by anything it must be loaded explicitly. I'll get 
clarification but the potential need for dynamic mode switching in Android need 
not affect the decision that GNU/Linux takes.
 
> >> If we do end up using ELF flags then maybe adding two new EF_MIPS_ABI
> >> enums would be better.  It's more likely to be trapped by old loaders
> >> and avoids eating up those precious remaining bits.
> >
> > Sound's reasonable but I'm still trying to determine how this
> > information can be propagated from loader to dynamic loader.
> 
> The dynamic loader has access to the ELF headers so I didn't think it
> would need any help.

As I understand it the dynamic loader only has specific access to the program 
headers of the executable not the ELF headers. There is no question that the 
dynamic loader has access to DSO ELF headers but we need the start point too.
 
> >> You didn't say specifically how a static program's crt code would
> >> know whether it was linked as modeless or in a specific FR mode.
> >> Maybe the linker could define a special hidden symbol?
> >
> > Why do you say crt rather than dlopen? The mode requirement should
> > only matter if you want to change it and dlopen should be able to
> > access information in the same way 

RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-25 Thread Matthew Fortune
Richard Sandiford  writes
> Doug Gilmore  writes:
> > On 02/24/2014 10:42 AM, Richard Sandiford wrote:
> >>...
> >>> AIUI the old form never really worked reliably due to things like
> >>> newlib's setjmp not preserving the odd-numbered registers, so it
> >>> doesn't
>  seem worth keeping around.  Also, the old form is identified by the
>  GNU attribute (4, 4) so it'd be easy for the linker to reject links
>  between the old and the new form.
> >>>
> >>> That is true. You will have noticed a number of changes over recent
> >>> months to start fixing fp64 as currently defined but having found
> >>> this new solution then such fixes are no longer important. The lack
> >>> of support for gp32 fp64 in linux is further reason to permit
> >>> redefining it. Would you be happy to retain the same builtin defines
> >>> for FP64 if changing its behaviour (i.e. __mips_fpr=64)?
> >>
> >>I think that should be OK.  I suppose a natural follow-on question is
> >>what __mips_fpr should be for -mfpxx.  Maybe just 0?
> > I think we should think carefully about just making -mfp64 just
> disappear.
> > The support has existed for bare iron for quite a while, and we do
> > internal testing of MSA using -mfp64.  I'd rather avoid a flag day.
> > It would be good to continue recognizing that object files with
> > attribute (4, 4)
> > (-mfp64) are not compatible with other objects.
> 
> Right, that was the idea.  (4, 4) would always mean the current form of
> -mfp64 and the linker would reject links between (4, 4) and the new -
> mfp64 form.
> 
> The flag day was more on the GCC and GAS side.  I don't see the point in
> supporting both forms there at the same time, since it significantly
> complicates the interface and since AIUI the old form was never really
> suitable for production use.

That sounds OK to me.

I'm aiming to have an experimental implementation of the calling convention 
changes as soon as possible although I am having difficulties getting the frx 
calling convention working correctly.

The problem is that frx needs to treat registers as 64bit sometimes and 32bit 
at other times.
a) I need the aliasing that 32bit registers gives me (use of an even-numbered 
double clobbers the corresponding odd-numbered single. This is to prevent both 
the double and odd numbered single being used simultaneously.
b) I need the 64bit register layout to ensure that 64bit values in caller-saved 
registers are saved as 64-bit (rather than 2x32-bit) and 32-bit registers are 
saved as 32-bit and never combined into a 64-bit save. Caller-save.c flattens 
the caller-save problem down to look at only hard registers not modes which is 
frustrating.

It looks like caller-save.c would need a lot of work to achieve b) with 32-bit 
hard registers but I equally don't know how I could achieve a) for 64-bit 
registers. I suspect a) is marginally easier to solve in the end but have to 
find a way to say that using reg x as 64-bit prevents allocation of x+1 as 
32-bit despite registers being 64-bit. The easy option is to go for 64-bit 
registers and never use odd-numbered registers for single-precision or 
double-precision but I don't really want frx to be limited to that if at all 
possible. Any suggestions?

The special handling for callee-saved registers is not a problem (I think) as 
it is all backend code for that (assuming a or b is resolved).

Regards,
Matthew


RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-02-25 Thread Matthew Fortune
> Matthew Fortune  writes:
> >> >> If we do end up using ELF flags then maybe adding two new
> >> >> EF_MIPS_ABI enums would be better.  It's more likely to be trapped
> >> >> by old loaders and avoids eating up those precious remaining bits.
> >> >
> >> > Sound's reasonable but I'm still trying to determine how this
> >> > information can be propagated from loader to dynamic loader.
> >>
> >> The dynamic loader has access to the ELF headers so I didn't think it
> >> would need any help.
> >
> > As I understand it the dynamic loader only has specific access to the
> > program headers of the executable not the ELF headers. There is no
> > question that the dynamic loader has access to DSO ELF headers but we
> > need the start point too.
> 
> Sorry, forgot about that.  In that case maybe program headers would be
> best, like you say.  I.e. we could use a combination of GNU attributes
> and a new program header, with the program header hopefully being more
> general than for just this case.  I suppose this comes back to the
> thread from binutils@ last year about how to manage the dwindling number
> of free flags:
> 
> https://www.sourceware.org/ml/binutils/2013-09/msg00039.html
>  to https://www.sourceware.org/ml/binutils/2013-09/msg00099.html
> 
> >> >> You didn't say specifically how a static program's crt code would
> >> >> know whether it was linked as modeless or in a specific FR mode.
> >> >> Maybe the linker could define a special hidden symbol?
> >> >
> >> > Why do you say crt rather than dlopen? The mode requirement should
> >> > only matter if you want to change it and dlopen should be able to
> >> > access information in the same way that a dynamic linker would. It
> >> > may seem redundant but perhaps we end up having to mark an
> >> > executable with mode requirements in two ways. The primary one
> >> > being the ELF flag and the secondary one being a processor specific
> >> > program header. The ELF flags are easy to use/already used for the
> >> > program loader and when scanning the needs of an object being
> >> > loaded, but the program header is something that is easy to inspect
> for an already-loaded object.
> >> > Overall though, a new program header would be sufficient in all
> >> > cases, with a few modifications here and there.
> >>
> >> Sorry, what I meant was: how would an executable built with -static
> >> be handled?  And I was assuming it would be up to the executable's
> >> startup code to set the FR mode.  That startup code (from glibc)
> >> would normally be modeless itself but would need to know whether any
> >> FR0 or FR1 objects were linked in.  (FWIW ifuncs have a similar
> >> problem: without the loader to help, the startup code has to resolve
> >> the ifuncs itself.  The static linker defines special symbols around
> >> a block of IRELATIVE relocs and then the startup code applies those
> >> relocs in a similar way to the dynamic linker.  I was thinking a
> >> linker-defined symbol could be used to record the FR mode too.)
> >>
> >> But perhaps you were thinking of getting the kernel to set the FR
> >> mode instead?
> >
> > I was thinking the kernel would set an initial FR mode that was at
> > least compatible with the ELF flags. Do you feel all this should be
> > done in user space? We only get user mode FR control in MIPS r5 so
> > this would make it more challenging to get into FR1 mode for MIPS32r2.
> > I'd prefer not to be able to load an FR1 program than crash in the crt
> > while trying to turn it on. There is however some expectation that the
> > kernel would trap and emulate UFR on MIPS32r2 for the dynamic loader
> case anyway.
> 
> Right -- the kernel needs to let userspace change FR if the dynamic
> loader case is going to work.  And I think if it's handled by userspace
> for dynamic executables then it should be handled by userspace for
> static ones too.  Especially since the mechanism used for static
> executables would then be the same as for bare metal, meaning that we
> only really have 2 cases rather than 3.

Although the dynamic case does mean mode switching must be possible at user 
level I do think it is important for the OS and bare metal crt to prepare an 
environment that is suitable for the original program including setting an 
appropriate FR mode. I would use the existing support for linux and bare metal 
for getting the fr mode correct f

RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-03 Thread Matthew Fortune
> > Sorry, forgot about that.  In that case maybe program headers would be
> > best, like you say.  I.e. we could use a combination of GNU attributes
> > and a new program header, with the program header hopefully being more
> > general than for just this case.  I suppose this comes back to the
> > thread from binutils@ last year about how to manage the dwindling
> > number of free flags:
> >
> > https://www.sourceware.org/ml/binutils/2013-09/msg00039.html
> >  to https://www.sourceware.org/ml/binutils/2013-09/msg00099.html
> >

There are a couple of issues to resolve in order to use gnu attributes to 
record FP requirements at the module level. As it currently stands gnu 
attributes are controlled via the .gnu_attribute directive and these are 
emitted explicitly by the compiler. I think it is important that a more 
meaningful directive is available but it will need to interact nicely with the 
.gnu_attribute as well.

The first problem is that there will be new ways to influence whether a gnu 
attribute is emitted or not. i.e. the command line options -mfp32, -mfpxx, 
-mfp64 will infer the relevant attribute Tag_GNU_MIPS_ABI_FP and if the .module 
directive is present then that would override it. Will there be any problems 
with a new ways to generate a gnu attribute?

The second problem is that in order to support relaxing a mode requirement then 
any up-front directive/command line option that sets a specific fp32/fp64 
requirement needs to be updated to fpxx. With gnu attributes this would mean 
updating an existing Tag_GNU_MIPS_ABI_FP setting to be modeless.

I don't think any other port does this kind of thing in binutils but that 
doesn't mean we can't I guess.

Regards,
Matthew


RE: [RFC][ARM] Naming for new switch to check for mixed hardfloat/softfloat compat

2014-03-04 Thread Matthew Fortune
Hi Thomas,

Do you particularly need a switch for this? You could view this as simply 
relaxing the ABI requirements of a module, a switch would only serve to enforce 
the need for a compatible ABI and error if not. If you build something for a 
soft-float ABI and never actually trigger any of the soft-float specific parts 
of the ABI then you could safely mark the module as no-float ABI (same for 
hard-float). A simple check would be if floating point types are used in 
parameters or returns but actually it could be more refined still if none of 
the arguments or returns would be passed differently between the ABI types. The 
problem with relaxing the ABI is that you only know whether it can be relaxed 
at the end of compiling all functions, I am currently doing some work for MIPS 
where the assembler will be calculating overall requirements based on an 
initial setting and then analysis of the code in the module. To relax a 
floating point ABI I would expect to emit an ABI attribute at the head of a 
file, which is either soft or hard float, but then each function would get an 
attribute to say if it ended up as a compatible ABI. If all global functions 
say compatible then the module can be relaxed to be a compatible FP ABI.

I think the ability to detect the case of generating ABI agnostic code would be 
useful for other architectures too.

MIPS does have an option for something similar to this which is -mno-float but 
it does not really do what you are aiming for here. The -mno-float option marks 
a module as float ABI agnostic but actually performs code gen for a soft-float 
ABI. It is up to the programmer to avoid floating point in function signatures. 
Perhaps this option would be useful to support the enforced compatible ABI but 
being able to relax the ABI is better still as it would require no effort from 
the end user. I'm planning on proposing these kind of changes for MIPS in the 
near future.
 
Regards,
Matthew

> -Original Message-
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
> Thomas Preudhomme
> Sent: 04 March 2014 07:02
> To: gcc@gcc.gnu.org
> Subject: [RFC][ARM] Naming for new switch to check for mixed
> hardfloat/softfloat compat
> 
> [Please CC me as I'm not subscribed to this list]
> 
> Hi there,
> 
> I'm currently working on adding a switch to check whether public
> function involve float parameters or return values. Such a check would
> be useful for people trying to write code that is compatible with both
> base standard (softfloat) and standard variant (hardfloat) ARM calling
> convention. I also intend to set the ELF attribute Tag_ABI_VFP_args to
> value 3 (code compatible with both ABI) so this check would allow to
> make sure such value would be set.
> 
> I initially thought about reusing -mfloat-abi with the value none for
> that purpose since it would somehow define a new ABI where no float can
> be used. However, it would then not be possible to forbit float in
> public interface with the use of VFP instructions for float arithmetic
> (softfp) because this switch conflates the float ABI with the use of a
> floating point unit for float arithmetic. Also, gcc passes -mfloat-abi
> down to the assembler and that would mean teaching the assembler about -
> mfloat-abi=none as well.
> 
> I thus think that a new switch would be better and am asking for your
> opinion about it as I would like this functionality to incorporate gcc
> codebase.
> 
> Best regards,
> 
> Thomas Preud'homme


RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-04 Thread Matthew Fortune
> Matthew Fortune  writes:
> >> > Sorry, forgot about that.  In that case maybe program headers would
> >> > be best, like you say.  I.e. we could use a combination of GNU
> >> > attributes and a new program header, with the program header
> >> > hopefully being more general than for just this case.  I suppose
> >> > this comes back to the thread from binutils@ last year about how to
> >> > manage the dwindling number of free flags:
> >> >
> >> > https://www.sourceware.org/ml/binutils/2013-09/msg00039.html
> >> >  to https://www.sourceware.org/ml/binutils/2013-09/msg00099.html
> >> >
> >
> > There are a couple of issues to resolve in order to use gnu attributes
> > to record FP requirements at the module level. As it currently stands
> > gnu attributes are controlled via the .gnu_attribute directive and
> > these are emitted explicitly by the compiler. I think it is important
> > that a more meaningful directive is available but it will need to
> > interact nicely with the .gnu_attribute as well.
> >
> > The first problem is that there will be new ways to influence whether
> > a gnu attribute is emitted or not. i.e. the command line options
> > -mfp32, -mfpxx, -mfp64 will infer the relevant attribute
> > Tag_GNU_MIPS_ABI_FP and if the .module directive is present then that
> > would override it. Will there be any problems with a new ways to
> generate a gnu attribute?
> 
> I think we should just give an error if any .gnu_attributes are
> inconsistent with the module-level setting (whether that comes from
> .module or command-line flags).

I would need to account for the -msoft-float and -msingle-float command line 
options to calculate module-level setting in order to do this, which is fine. 
There is however no way to infer the no-float ABI from command line options as 
it is not passed through from the GCC driver. This would mean the no-float ABI 
would always conflict with the module level setting. I suspect the only answer 
is to make an exception and allow a .gnu_attribute 4,0 to take precedence over 
a command line option (but not a .module option). This seems a little 
convoluted in the end.

The only other alternative is to just allow the .module fp=... options to act 
as human readable aliases for the .gnu_attribute options and take whatever 
comes last.

> > The second problem is that in order to support relaxing a mode
> > requirement then any up-front directive/command line option that sets
> > a specific fp32/fp64 requirement needs to be updated to fpxx. With gnu
> > attributes this would mean updating an existing Tag_GNU_MIPS_ABI_FP
> > setting to be modeless.
> 
> Not sure what you mean here, sorry.

At the end of a unit we will know whether an FP32 or FP64 ABI can be relaxed to 
FPXX. This will happen if no floating point code has been emitted that uses odd 
numbered registers. All I was checking is that it is going to be acceptable to 
alter the FP ABI attribute even if it was set using the .gnu_attribute 
directive. I know I 'can' do it in the code as I have it working already just 
checking that it is OK. I suppose this case is going to be quite rare (hand 
written assembly code that includes a .gnu_attribute 4,1 which is mode safe) 
but I'd like to catch as many cases as possible and relax the ABI.

Regards,
Matthew


RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-04 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> >> Matthew Fortune  writes:
> >> >> > Sorry, forgot about that.  In that case maybe program headers
> >> >> > would be best, like you say.  I.e. we could use a combination of
> >> >> > GNU attributes and a new program header, with the program header
> >> >> > hopefully being more general than for just this case.  I suppose
> >> >> > this comes back to the thread from binutils@ last year about how
> >> >> > to manage the dwindling number of free flags:
> >> >> >
> >> >> > https://www.sourceware.org/ml/binutils/2013-09/msg00039.html
> >> >> >  to https://www.sourceware.org/ml/binutils/2013-09/msg00099.html
> >> >> >
> >> >
> >> > There are a couple of issues to resolve in order to use gnu
> >> > attributes to record FP requirements at the module level. As it
> >> > currently stands gnu attributes are controlled via the
> >> > .gnu_attribute directive and these are emitted explicitly by the
> >> > compiler. I think it is important that a more meaningful directive
> >> > is available but it will need to interact nicely with the
> .gnu_attribute as well.
> >> >
> >> > The first problem is that there will be new ways to influence
> >> > whether a gnu attribute is emitted or not. i.e. the command line
> >> > options -mfp32, -mfpxx, -mfp64 will infer the relevant attribute
> >> > Tag_GNU_MIPS_ABI_FP and if the .module directive is present then
> >> > that would override it. Will there be any problems with a new ways
> >> > to
> >> generate a gnu attribute?
> >>
> >> I think we should just give an error if any .gnu_attributes are
> >> inconsistent with the module-level setting (whether that comes from
> >> .module or command-line flags).
> >
> > I would need to account for the -msoft-float and -msingle-float
> > command line options to calculate module-level setting in order to do
> > this, which is fine. There is however no way to infer the no-float ABI
> > from command line options as it is not passed through from the GCC
> > driver. This would mean the no-float ABI would always conflict with
> > the module level setting.
> 
> -mno-float the GCC option doesn't really select a different ABI.
> It does leave the FP attribute as being the default 0/dont-care, but
> that's just like it would be when compiling most hand-written assembly
> code, including code written before -mno-float or .gnu_attribute was
> invented.
> 
> > I suspect the only answer is to make an exception and allow a
> > .gnu_attribute 4,0 to take precedence over a command line option (but
> > not a .module option). This seems a little convoluted in the end.
> 
> I don't think we should ever override an explicit .gnu_attribute.
> The most we can do is report a contradiction.
> 
> >> > The second problem is that in order to support relaxing a mode
> >> > requirement then any up-front directive/command line option that
> >> > sets a specific fp32/fp64 requirement needs to be updated to fpxx.
> >> > With gnu attributes this would mean updating an existing
> >> > Tag_GNU_MIPS_ABI_FP setting to be modeless.
> >>
> >> Not sure what you mean here, sorry.
> >
> > At the end of a unit we will know whether an FP32 or FP64 ABI can be
> > relaxed to FPXX. This will happen if no floating point code has been
> > emitted that uses odd numbered registers. All I was checking is that
> > it is going to be acceptable to alter the FP ABI attribute even if it
> > was set using the .gnu_attribute directive. I know I 'can' do it in
> > the code as I have it working already just checking that it is OK. I
> > suppose this case is going to be quite rare (hand written assembly
> > code that includes a .gnu_attribute 4,1 which is mode safe) but I'd
> > like to catch as many cases as possible and relax the ABI.
> 
> Yeah, I don't think we should do any relaxation like that.  If a
> specific attribute value is chosen we should honour it even if it
> doesn't seem necessary.  If -mfp32, -mfp64, .module fp=32 or .module
> fp=64 is used we should honour it even if -mfpxx or .module fp=xx seems
> OK.

Are you're OK with automatically selecting fpxx if no -mfp option, no .module 
and no .gnu_attribute exists? Such code would currently end up as FP ABI Any 
even if FP code was present, I don't suppose anything would get worse if 

RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-05 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> > Are you're OK with automatically selecting fpxx if no -mfp option, no
> > .module and no .gnu_attribute exists? Such code would currently end up
> > as FP ABI Any even if FP code was present, I don't suppose anything
> > would get worse if this existing behaviour simply continued though.
> 
> The -mfp setting is usually implied by the -mabi setting.  I don't think
> we should change that.  Since this is a new mode, and since the fpxx
> markup will be available from the start, everyone using fpxx should say
> so explicitly.
> 
> E.g. maybe the rules should be:
> 
> (1) Any explicit .gnu_attribute 4 is always used, although we might
> give a diagnostic if it's incompatible with the module-level
> setting.
> 
> (2) Otherwise, if the code does not use FP then the attribute is left
> at the default of 0.
> 
> (3) Otherwise, a nonzero .gnu_attribute 4 is implied from the module-
> level
> setting.
> 
> (4) For compatibility, -mabi=32 continues to imply -mfp32.  fpxx mode
> must
> be selected explicitly.
> 
> Which was supposed to be simple, but maybe isn't so much.

This sounds OK. I'd rather (4) permitted transition to fpxx for 'safe' FP code 
but let's see if we can do without it. Setjmp/longjmp are the only obvious 
candidates for using FP code in assembly files and these need to transition to 
fpxx.

The glibc implementation of setjmp/longjmp is in C so the new defaults from the 
compiler will lead to this being fpxx as -mips32r2 will imply -mfpxx so that is 
OK, these modules will be tagged as fpxx.

Currently newlib's implementation is assembly code with no .gnu_attributes. 
Under the rules above this would start to be implicitly tagged as gnu_attribute 
4,1 (fp32). Any thoughts on how we transition this to fpxx and still have the 
modules buildable with old tools as well? I'm not sure if it will be acceptable 
to say that it has to be rewritten in C.

There will also be uclibc and bionic to deal with too for setjmp/longjmp but I 
don't have their source to hand.

Regards,
Matthew


RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-05 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> > Richard Sandiford  writes:
> >> Matthew Fortune  writes:
> >> > Are you're OK with automatically selecting fpxx if no -mfp option,
> >> > no .module and no .gnu_attribute exists? Such code would currently
> >> > end up as FP ABI Any even if FP code was present, I don't suppose
> >> > anything would get worse if this existing behaviour simply
> continued though.
> >>
> >> The -mfp setting is usually implied by the -mabi setting.  I don't
> >> think we should change that.  Since this is a new mode, and since the
> >> fpxx markup will be available from the start, everyone using fpxx
> >> should say so explicitly.
> >>
> >> E.g. maybe the rules should be:
> >>
> >> (1) Any explicit .gnu_attribute 4 is always used, although we might
> >> give a diagnostic if it's incompatible with the module-level
> >> setting.
> >>
> >> (2) Otherwise, if the code does not use FP then the attribute is left
> >> at the default of 0.
> >>
> >> (3) Otherwise, a nonzero .gnu_attribute 4 is implied from the module-
> >> level
> >> setting.
> >>
> >> (4) For compatibility, -mabi=32 continues to imply -mfp32.  fpxx mode
> >> must
> >> be selected explicitly.
> >>
> >> Which was supposed to be simple, but maybe isn't so much.
> >
> > This sounds OK. I'd rather (4) permitted transition to fpxx for 'safe'
> > FP code but let's see if we can do without it. Setjmp/longjmp are the
> > only obvious candidates for using FP code in assembly files and these
> > need to transition to fpxx.
> >
> > The glibc implementation of setjmp/longjmp is in C so the new defaults
> > from the compiler will lead to this being fpxx as -mips32r2 will imply
> > -mfpxx so that is OK, these modules will be tagged as fpxx.
> 
> Hmm, I don't think -mips32r2 should make any difference here.
> You've specified it so that fpxx will work with MIPS II and above and
> I'd prefer not to have an architecture option implicitly changing the
> ABI.  (They sometimes did in the long-distant past but it just led to
> confusion.)

I didn't mean to single out mips32r2 here it applies equally to anything except 
mips1 with O32. 

> I think instead we should have a configuration switch that allows a
> particular -mfp option to be inserted alongside -mabi=32 if no explicit
> -mfp is given.  This is how most --with options work.  Maybe --with-fp-
> 32={32|64|xx}?  Specific triples could set a default value if they like.
> E.g. the MTI, SDE and mipsisa* ones would probably want to default to --
> with-32-fp=xx.  Triples aimed at MIPS IV and below would stay as they
> are.  (MIPS IV is sometimes used with -mabi=32.)
> 
> --with-fp-32 isn't the greatest name but is at least consistent with
> --with-arch-32 and -mabi=32.  Maybe --with-fp-32=64 is so weird that
> breaking consistency is better though.

Tying the use of fpxx by default to a configure time setting is OK with me. 
When enabled it would still have to follow the rules as defined in the design 
in that it can only apply to architectures that can support the variant. 
Currently that means everything but mips1. I'm not sure this is the same as 
tying an ABI to an architecture as both fp32 and fpxx are O32 and link 
compatible. Perhaps the configure switch would be --with-o32-fp={32|64|xx}. 
This shows it is just an O32 related setting.

> > Currently newlib's implementation is assembly code with no
> > .gnu_attributes. Under the rules above this would start to be
> > implicitly tagged as gnu_attribute 4,1 (fp32). Any thoughts on how we
> > transition this to fpxx and still have the modules buildable with old
> > tools as well? I'm not sure if it will be acceptable to say that it
> > has to be rewritten in C.
> 
> If it's assembled as -mfpxx then it'll be implicitly tagged with the new
> .gnu_attribute rather than 4,1.  If it's not assembled as -mfpxx then
> 4,1 would be the right choice.

So this would be dependent on the build system ensuring -mfpxx is passed as 
appropriate if the toolchain supports it. There is some risk in this too if the 
existing code (which I know is not fpxx safe) gets built with a new toolchain 
then it will be tagged as fpxx. I wonder if this tells us that command line 
options cannot safely set the FP ABI away from the default. Instead only the 
.module and .gnu_attribute can set it as only the source code can know what FP 
mode it was written for. The change to your 4 points above would be that the 
module level setting is not impacted by the command line -mfp options.

This would then require us to have an explicit attribute in the source to 
select fpxx which would need to be optionally included dependent on assembler 
support for .module. (The relaxation would have helped here of course.)

> Thanks,
> Richard


RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-13 Thread Matthew Fortune
Hi Richard/all,

The spec on: 
https://dmz-portal.mips.com/wiki/MIPS_O32_ABI_-_FR0_and_FR1_Interlinking has 
been updated and attempts to account for all the feedback. Not everything has 
been possible to simplify/rework as requested but I believe I have managed to 
address many points cleanly.

Sections 9 and 10 contain pretty much all the changes and a fresh read is 
better than me attempting to summarise. All the attributes, flags, etc are now 
defined with specific values and specific comments regarding kernel support for 
UFR have been added.

I have an implementation of everything except the program loader which I will 
post to ensure the overall approach in code is acceptable. I'll do this on each 
project's list appropriately. Since I am only just starting testing I have no 
test cases to offer alongside the patches currently, I will be working on that 
next. I'm deferring writing all the tests in case the implementation/behaviour 
changes, hence initial review for now.

The implementation in GCC relies on LRA due to the way in which caller-save is 
handled. A patch to enable LRA and fix all regressions is being developed 
concurrently and will be posted ready for stage 1.

Let me know if there is any feedback on the updated spec.

I'm afraid the last aspect we discussed is still a point of contention :-) I'm 
sure we'll get there though. I've added more comments inline below:

Richard Sandiford  writes:
> Matthew Fortune  writes:
> >> I think instead we should have a configuration switch that allows a
> >> particular -mfp option to be inserted alongside -mabi=32 if no
> >> explicit -mfp is given.  This is how most --with options work.  Maybe
> >> --with-fp- 32={32|64|xx}?  Specific triples could set a default value
> if they like.
> >> E.g. the MTI, SDE and mipsisa* ones would probably want to default to
> >> -- with-32-fp=xx.  Triples aimed at MIPS IV and below would stay as
> >> they are.  (MIPS IV is sometimes used with -mabi=32.)
> >>
> >> --with-fp-32 isn't the greatest name but is at least consistent with
> >> --with-arch-32 and -mabi=32.  Maybe --with-fp-32=64 is so weird that
> >> breaking consistency is better though.
> >
> > Tying the use of fpxx by default to a configure time setting is OK
> > with me. When enabled it would still have to follow the rules as
> > defined in the design in that it can only apply to architectures that
> > can support the variant.
> 
> Right.  It's really equivalent to putting the -mfp on every command line
> that doesn't have one.
> 
> > Currently that means everything but mips1.
> 
> Yeah, using -mips1 on a --with-{o}32-fp=xx toolchain would be an error.
> 
> > I'm not sure this is the same as tying an ABI to an architecture as
> > both fp32 and fpxx are O32 and link compatible. Perhaps the configure
> > switch would be --with-o32-fp={32|64|xx}. This shows it is just an O32
> > related setting.
> 
> What I meant is that -march= and -mips shouldn't imply a different -mfp
> setting.  The -mfp setting should be self-contained and it should be an
> error if the architecture isn't compatible.
> 
> We might be in violent agreement here :-)  Like I say, I was just a bit
> worried by the earlier -mips32r2 thing because there was a time when a -
> mips option really could imply things like -mabi, -mgp and -mfp.
> 
> --with-o32-fp would be OK with me.  I'm just worried about the ABI being
> spelt differently from -mabi=, but there's probably no perfect
> alternative.

I'd like to encourage the perspective that -mfp* options do not lead to a 
different ABI in the same sense that other variations do. While it is true that 
the calling conventions and code generation rules vary, 2 out of 3 combinations 
of -mfp32 -mfpxx and -mfp64 with -mabi=o32 are link compatible. The 
introduction of the modeless O32 ABI is intended to remove the part of the O32 
definition that says 'FR=0' and hence the architecture then gets to dictate 
this and the generated code is still O32. It is true today that we have several 
architectures that mandate FR=0, some that cannot support fpxx and some that 
can support all fp* variations. I see nothing preventing the future having an 
architecture only supporting FR=1 though which we should also think about. When 
considering such a scenario it would be highly desirable for the following to 
just work as I believe architectural restrictions should be accounted for when 
designing default options. If the architecture gives no choice then it should 
just work IMO:

Some ideas (speculating that someone builds a core called mips_n with only 
FR=1):

--with-o32-fp=32

mips-*-gcc -march=mips1 fp.c ==> generates fp32 code
mips-*-gcc -m

RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-14 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> > The spec on:
> > https://dmz-portal.mips.com/wiki/MIPS_O32_ABI_-_FR0_and_FR1_Interlinki
> > ng has been updated and attempts to account for all the feedback. Not
> > everything has been possible to simplify/rework as requested but I
> > believe I have managed to address many points cleanly.
> 
> (FWIW there seem to be some weird line breaks in the page which make it
> a bit hard to read.)

Apologies, I edited it offline and didn't check the result carefully enough. 
I'll clean it up. 

> The main thing that stood out for me was section 9.  If we have the
> attributes and the program header (both good to have IMO) then we
> shouldn't have an ELF flag too.  "Static" consumers should use the
> attribute and "dynamic" consumers should use the program header.
> The main point of encoding future info in a program header was to
> relieve the pressure on the ELF flags.

I know what you mean. I kept the ELF flag around because it firstly already 
exists (with the correct meaning as it happens) and secondly ELF flags are 
already consumed in the program loader whereas a small amount of new framework 
in the kernel is needed for the loader to respond to program headers. The 
'executable stack' header is currently consumed but the mechanism is not 
extensible today. My thinking is that the ELF flag eases us into the program 
loader but could validly be dropped/not required long term. It is largely 
ignored by the tools anyway in favour of the program headers.

I am happy to remove the ELF flag if I can confirm with our MIPS kernel 
developers that they can implement the program header inspection sooner rather 
than later.
 
> As far as the program header encoding goes: I was thinking of a more
> general mechanism that specifies a block of data, a bit like the current
> PT_MIPS_OPTIONS does.  Encoding the information directly in the
> enumeration wouldn't scale well, since we'd end up with the same problem
> as we have now for ELF flags.  It would also be a bit wasteful to
> specify two bits of information this way since the other parts of the
> header structure don't carry any weight.

I was trying to avoid the need for a program header to refer to a block of data 
as that is another part of the object that has to be loaded to determine the 
flag information. There are 2^28 processor specific program headers available 
which seems quite generous (I half though of using 2 for the two modes), but I 
do also recognise that most of the header then becomes wasted space. I guess 
there may be some complaint if we choose to abuse every field of a header to 
encode information (i.e. address, size, alignment etc) but this would be a nice 
compact way to store flags. It would be more visible to put flags in the 
address fields as these are already printed by readelf et al. but the processor 
specific flags are not. Personally I'd open up all the fields to abuse over 
adding a block of data. The block of data increases the complexity of the 
program loader and dynamic loader as they have to ensure more of an object is 
read in order to make a decision. The extra data needed from an object would 
also be target specific, all do-able I'm just not sure on complexity. I wonder 
if Joseph or Maciej have any thoughts here as I believe they discussed this 
idea of using program headers in the past. Since I'm far from being an expert 
in this area I'm OK with anything as long as I can get all maintainers of 
dynamic loaders and program loaders to agree (ha!). Bionic, glibc, uclibc and 
linux kernel are the primary targets here.

Regards,
Matthew


RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-16 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> > Richard Sandiford  writes:
> >> Matthew Fortune  writes:
> >> >> I think instead we should have a configuration switch that allows
> >> >> a particular -mfp option to be inserted alongside -mabi=32 if no
> >> >> explicit -mfp is given.  This is how most --with options work.
> >> >> Maybe
> >> >> --with-fp- 32={32|64|xx}?  Specific triples could set a default
> >> >> value
> >> if they like.
> >> >> E.g. the MTI, SDE and mipsisa* ones would probably want to default
> >> >> to
> >> >> -- with-32-fp=xx.  Triples aimed at MIPS IV and below would stay
> >> >> as they are.  (MIPS IV is sometimes used with -mabi=32.)
> >> >>
> >> >> --with-fp-32 isn't the greatest name but is at least consistent
> >> >> with
> >> >> --with-arch-32 and -mabi=32.  Maybe --with-fp-32=64 is so weird
> >> >> that breaking consistency is better though.
> >> >
> >> > Tying the use of fpxx by default to a configure time setting is OK
> >> > with me. When enabled it would still have to follow the rules as
> >> > defined in the design in that it can only apply to architectures
> >> > that can support the variant.
> >>
> >> Right.  It's really equivalent to putting the -mfp on every command
> >> line that doesn't have one.
> >>
> >> > Currently that means everything but mips1.
> >>
> >> Yeah, using -mips1 on a --with-{o}32-fp=xx toolchain would be an
> error.
> >>
> >> > I'm not sure this is the same as tying an ABI to an architecture as
> >> > both fp32 and fpxx are O32 and link compatible. Perhaps the
> >> > configure switch would be --with-o32-fp={32|64|xx}. This shows it
> >> > is just an O32 related setting.
> >>
> >> What I meant is that -march= and -mips shouldn't imply a different
> >> -mfp setting.  The -mfp setting should be self-contained and it
> >> should be an error if the architecture isn't compatible.
> >>
> >> We might be in violent agreement here :-)  Like I say, I was just a
> >> bit worried by the earlier -mips32r2 thing because there was a time
> >> when a - mips option really could imply things like -mabi, -mgp and -
> mfp.
> >>
> >> --with-o32-fp would be OK with me.  I'm just worried about the ABI
> >> being spelt differently from -mabi=, but there's probably no perfect
> >> alternative.
> >
> > I'd like to encourage the perspective that -mfp* options do not lead
> > to a different ABI in the same sense that other variations do. While
> > it is true that the calling conventions and code generation rules
> > vary, 2 out of 3 combinations of -mfp32 -mfpxx and -mfp64 with
> > -mabi=o32 are link compatible.
> 
> -mfp32 and -mfp64 aren't link-compatible though, so -mfp is part of the
> ABI.
> What you're adding is a new variant that is individually link-compatible
> with the other two (but obviously not both simultaneously).  It's a
> third ABI variant in itself.
> 
> > The introduction of the modeless O32 ABI is intended to remove the
> > part of the O32 definition that says 'FR=0' and hence the architecture
> > then gets to dictate this and the generated code is still O32. It is
> > true today that we have several architectures that mandate FR=0, some
> > that cannot support fpxx and some that can support all fp* variations.
> > I see nothing preventing the future having an architecture only
> > supporting FR=1 though which we should also think about.
> 
> Agreed.
> 
> > When considering such a scenario it would be highly desirable for the
> > following to just work as I believe architectural restrictions should
> > be accounted for when designing default options. If the architecture
> > gives no choice then it should just work IMO:
> >
> > Some ideas (speculating that someone builds a core called mips_n with
> > only FR=1):
> >
> > --with-o32-fp=32
> >
> > mips-*-gcc -march=mips1 fp.c ==> generates fp32 code mips-*-gcc
> > -march=mips2 fp.c ==> generates fp32 code mips-*-gcc -march=mips32r2
> > fp.c ==> generates fp32 code mips-*-gcc -march=mips32r2 -mfp64 fp.c
> > ==> generates fp64 code mips-*-gcc -march=mips_n fp.c ==> generates
> > fp64 code
> >
> > --with-o32-fp=xx
> >
> > mips-*-gcc -march=mips1 fp.c ==> genera

RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-17 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> >> > With these defaults, the closest supported ABI is used for each
> >> > architecture based on the --with-o32-fp build option. The only one
> >> > I really care about is the middle one as it makes full use of the
> >> > O32 FPXX ABI without a user needing to account for arch
> restrictions.
> >>
> >> Note that --with-* options just insert a canned -mfoo=bar option
> >> under certain conditions, with those conditions being the same
> >> regardless of "bar".
> >> So --with-o32-fp=32 should insert -mfp32 (and nothing else),
> >> --with-o32-
> >> fp=64 should insert -mfp64, etc.
> >>
> >> The rules should therefore be the same for both -mfp and --with-o32-
> fp.
> >> Should:
> >>
> >>   mips-*-gcc -march=mips1 -mfpxx
> >>
> >> generate -mfp32 code too?  It seems counter-intuitive to me.
> >>
> >> I suppose it depends on what you want -mfpxx to mean.  Do you want it
> >> to mean "use the new ABI that is link-compatible with both -mfp32 and
> >> - mfp64"
> >> or do you want it to mean "I don't care what the FR setting is; pick
> >> whatever seems best but be as flexible as possible"?  I'd assumed the
> >> former, in which case using it with an architecture that doesn't
> >> support it should be an error.
> >
> > In the end I do just want fpxx to mean use the new ABI that is
> > link-compatible. I think I have managed to confuse this discussion by
> > not understanding/separating vendor specific specs from generic option
> > handling as you explain later in your email. I only really wish to
> > allow a vendor specific config to infer a suitable default fp option
> > from -march (like -mabi=32 for 32-bit arch and -mabi=n32 for 64-bit
> arch).
> 
> Well, for avoidance of doubt, --with has priority over the vendor-
> specific choices, so really this comes down to what happens when no -mfp
> and --with-o32-fp options are used.

Yes that is what I understood.

> >> If you want to go for tha latter meaning then I think we should be
> >> careful to distinguish it from cases where we really are talking
> >> about the new ABI variant.  E.g. an ELF file has to commit to one of
> >> the three
> >> modes: you shouldn't have to look at the ELF's architecture setting
> >> in order to interpret the FP setting correctly.  And IMO the assembly
> >> code needs to commit to a specific mode too.  What do you think
> >> should happen
> >> for:
> >>
> >>   .module fp=xx
> >>   .module arch=mips_n
> >>
> >> Should the output be FR=X or FR=1?
> >
> > Well, even defining fpxx as simply being another abi variant there are
> > some issues. The current .set arch= options set fp32 for 32-bit
> > architectures and fp64 for 64-bit architectures which means we do have
> > to come up with some definition of how fpxx interacts with this. My
> > current implementation is that, for .set arch, the fp option is only
> > changed if the existing setting is incompatible with the new arch. So
> > carrying that logic to .module means that in the case above then the
> > output would be FPXX. Other examples would then be:
> >
> > .module fp=xx
> > .module arch=mips_n
> > 
> > .module fp=32
> > .module arch=mips_n
> > 
> >
> > .module fp=xx
> > .module arch=mips2
> > 
> > .module fp=64
> > .module arch=mips2
> >  (existing behaviour for .set)
> >
> > .module fp=xx
> > .module arch=mips1
> > 
> > .module fp=64
> > .module arch=mips1
> >  (existing behaviour for .set)
> >
> > This is weird though for the same reasons as you point out. You have
> > to know the arch to know what happens to the FP mode. If we just
> > continued with 32-bit arch setting fp=32 and 64-bit setting fp=64 then
> > we have a problem with something like mips_n where fp=32 would be
> > invalid. I really don't know what is best here!
> 
> The ".set mips" handling of gp and fp is really there for local changes
> to the architecture in a .set push/pop or .set mipsN/mips0.  (And IMO we
> the way we do it is a bit of a misfeature, but we have to keep it that
> way for compatibility.)
> 
> I don't think it should apply to .module though.  Ideally .module should
> work in the same way as passing the associated command-line option.

As it stands I wasn't planning on supporting .module arch= I was just going to 
add .module fp= and leave it at that. The only thing I need to give assembly 
code writers absolute control over is the overall FP mode of the module. I 
don't currently see any real need to increase the control a user has over 
architecture level. If we had .module arch= then having it just set the arch 
ignorant of FP mode seems fine, checking for erroneous combinations would be 
difficult due to some chicken and egg scenarios. Do you think I need to add 
.module arch= if I add .module fp= or can I take the easy option?

Regards,
Matthew


RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-18 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> > As it stands I wasn't planning on supporting .module arch= I was just
> > going to add .module fp= and leave it at that. The only thing I need
> > to give assembly code writers absolute control over is the overall FP
> > mode of the module. I don't currently see any real need to increase
> > the control a user has over architecture level. If we had .module
> > arch= then having it just set the arch ignorant of FP mode seems fine,
> > checking for erroneous combinations would be difficult due to some
> > chicken and egg scenarios. Do you think I need to add .module arch= if
> > I add .module fp= or can I take the easy option?
> 
> Despite the "arch controlling fp" difference, I think .set and .module
> should use common parsing code.  I.e. we should generalise s_mipsset to
> handle both of them rather than write a second parsing function for
> .module.
> There will be some cases where the function has to check "is this .set?"
> (e.g. push/pop), but that's good IMO, because it makes the differences
> obvious.
> 
> If we do have a common routine then we should make .module handle
> everything it can handle rather than just fp=.

Every case would need to look at set vs module as .set writes to mips_opts and 
.module writes to things like file_mips_arch? I suppose I could just rework all 
the global options to be part of a single mips_set_options structure to 
abstract this. Does that sound OK? The push/pop case (and perhaps some others) 
will still need special handling to prohibit them for .module.

Back to one of your questions discussing things like:

.module fp=xx
.module arch=mips2

An easy option would be to continue to have the arch options infer fp32 or fp64 
and require the .module fp=xx to come second. Then we just have error checking 
on the .module fp= option to ensure it is suitable for use with the previously 
specified architecture.

With .module in place like this then I expect the compiler should then start to 
record more information in the assembly text to indicate things like arch as 
well as fp. Obviously this will be tied to a configure time assembler feature 
test.

Regards,
Matthew


[RFC, MIPS] Relax NaN rules

2014-03-18 Thread Matthew Fortune
Hi,

I've sent this email to everyone who had opinions about the introduction of 
nan-2008 for mips according to the mailing list archives...

The NaN linkage rules introduced with -mnan=2008 enforce a strict rule that all 
code be built with either legacy NaN or 2008 NaN. This impacts both static and 
dynamic link and is implemented via an ELF flag. Two of the original threads 
about this are copied below for reference.

http://sourceware.org/ml/binutils/2013-07/msg00072.html
http://sourceware.org/ml/libc-ports/2013-08/msg00030.html

There are two limitations with the current support which make it difficult to 
use in production environments:

1) There is no way to mark a module as "don't care/not relevant". At a minimum 
this could be done via inspection of the GNU FP ABI attribute and when its 
value is 'Any' then NaNs don't matter. Better still would be that modules with 
floating point only require a certain NaN state if they use functions like 
__builtin_[s}nan. This would partially reduce the impact of the strict NaN 
checks.

2) Independent of (1), the strictness associated with NaN handling is a serious 
problem for linux distributions where there is slow (if any) acceptance of new 
build variants. New build variants for distributions cost months to put 
together (just from build time alone) and then the fractured ecosystem this 
leads to means they are either unused or at best make users confused. It is 
also true that linux derivatives like Android (and other projects which seek to 
be architecture agnostic) simply cannot tolerate incompatibilities like the NaN 
handling rules and will not introduce more than one variant for native code.

I believe that 99% of users don't care about the difference between signalling 
and quiet NaNs and even fewer actually turn on trapping for signalling NaNs. 
Those who do use, and rely on, behaviour of signalling NaNs very much know that 
they need this and can afford some extra effort to ensure that they are handled 
as expected. The following thread is one discussion about how sNaN is a 
somewhat pointless concept for most users:

http://gcc.gnu.org/ml/gcc/2013-11/msg00106.html

My proposal is that NaN handling, as implemented, remains but we turn it off by 
default (or allow it to be turned off by default at build time). By this I mean 
that GCC will use the default dynamic linker even in the presence of -mnan=2008 
and ld will happily link together legacy and 2008 NaN code and ld.so will 
happily do the same. If a user really cares about NaNs then they need to 
firstly find someone who will build them a Linux distribution to support them 
and secondly pull a link time switch '-mstrict-nan' to enable the NaN checks as 
currently implemented. The same functionality would then be supported by ld.so 
via an environment variable (or whatever other runtime configuration source 
there is) to indicate that it must enforce NaN checks as well.

Suggestions for option names are welcome. Initial suggestion: 
--with-nan-check=no, -m[no-]strict-nan, .

Regards,
Matthew


RE: [RFC, MIPS] Relax NaN rules

2014-03-18 Thread Matthew Fortune
Joseph Myers  writes:
> > 1) There is no way to mark a module as "don't care/not relevant". At a
> > minimum this could be done via inspection of the GNU FP ABI attribute
> > and when its value is 'Any' then NaNs don't matter. Better still would
> > be that modules with floating point only require a certain NaN state
> > if they use functions like __builtin_[s}nan. This would partially
> > reduce the impact of the strict NaN checks.
> 
> In general you can't tell whether a module cares.  It could have an 
> initializer
> 0.0 / 0.0, without having any function calls involving floating point (so in
> principle being independent of hard/soft float, but not of NaN format).  Or it
> could be written with knowledge of the ABI to do things directly with bit
> patterns (possibly based on a configure test rather than __mips_nan2008).
> The concept of a don't-care module is meaningful, but while heuristics can
> reliably tell that a module does care (e.g. GCC generated an encoding of a
> NaN bit-pattern, whether from __builtin_nan or 0.0 / 0.0) they can't so
> reliably tell that it doesn't care (although if it doesn't contain NaN bit-
> patterns, or manipulate representations of floating-point values through
> taking their addresses or using unions, you can probably be sure enough to
> mark it as don't-care - note that many cases where there are calls with
> floating-point arguments and results, but no manipulation of bit-patterns and
> no NaN constants, would be don't-care by this logic).

Thanks Joseph. I guess I'm not really pushing to have don't-care supported as 
it would take a lot of effort to determine when code does and does not care, 
you rightly point out more cases to deal with too. I'm not sure if the benefit 
would then be worth it or not as there would still be modules which do and do 
not care about old and new NaNs so it doesn't really relieve any pressure on 
toolchains or linux distributions. The second part of the proposal is more 
interesting/useful as it is saying I don't care about the impact of getting NaN 
encoding wrong and a tools vendor/linux distribution then gets to make that 
choice. Any comments on that aspect?

Regards,
Matthew



RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-18 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> > Richard Sandiford  writes:
> >> Matthew Fortune  writes:
> >> > As it stands I wasn't planning on supporting .module arch= I was
> >> > just going to add .module fp= and leave it at that. The only thing
> >> > I need to give assembly code writers absolute control over is the
> >> > overall FP mode of the module. I don't currently see any real need
> >> > to increase the control a user has over architecture level. If we
> >> > had .module arch= then having it just set the arch ignorant of FP
> >> > mode seems fine, checking for erroneous combinations would be
> >> > difficult due to some chicken and egg scenarios. Do you think I
> >> > need to add .module arch= if I add .module fp= or can I take the
> easy option?
> >>
> >> Despite the "arch controlling fp" difference, I think .set and
> >> .module should use common parsing code.  I.e. we should generalise
> >> s_mipsset to handle both of them rather than write a second parsing
> >> function for .module.
> >> There will be some cases where the function has to check "is this
> .set?"
> >> (e.g. push/pop), but that's good IMO, because it makes the
> >> differences obvious.
> >>
> >> If we do have a common routine then we should make .module handle
> >> everything it can handle rather than just fp=.
> >
> > Every case would need to look at set vs module as .set writes to
> > mips_opts and .module writes to things like file_mips_arch? I suppose
> > I could just rework all the global options to be part of a single
> > mips_set_options structure to abstract this. Does that sound OK?
> 
> Yeah, that's the kind of thing I was thinking of.  FWIW, if this is
> feeling like feature creep then I have a go this weekend.

Although it does seem like feature creep, I'm happy to do it as it gives a 
smaller problem for me to work on and submit to go through the motions. It may 
however be worthwhile tying this in with the introduction of -mfpxx to allow 
one assembler feature test to infer that both the new .module support and fpxx 
are implemented (or would you do this as two tests anyway?).
 
> > The push/pop case (and perhaps some others) will still need special
> > handling to prohibit them for .module.
> 
> Right.  But like I say, that's good IMO, since the differences become
> more obvious than they'd be with two different implementations.
> 
> > Back to one of your questions discussing things like:
> >
> > .module fp=xx
> > .module arch=mips2
> >
> > An easy option would be to continue to have the arch options infer
> > fp32 or fp64 and require the .module fp=xx to come second. Then we
> > just have error checking on the .module fp= option to ensure it is
> > suitable for use with the previously specified architecture.
> 
> I don't think it's a good idea for the order of the .modules to matter.
> I think as far as possible, putting .module in the code should be the
> same as passing the associated command-line option.
> 
> It probably makes sense to prohibit .module after an instruction has
> been assembled or after .set has been seen.  We could then handle the
> .module-adjusted options at those points (or at the end, if there are no
> .sets and no instructions).
> 
> > With .module in place like this then I expect the compiler should then
> > start to record more information in the assembly text to indicate
> > things like arch as well as fp. Obviously this will be tied to a
> > configure time assembler feature test.
> 
> Agreed.

Fantastic. I think the only loose end is:
http://gcc.gnu.org/ml/gcc/2014-03/msg00204.html

I'm concerned about the program loader and dynamic linker having to read a 
segment as well as the header to get the feature bits when the program header 
fields could be interpreted specially for a new program header type. With 7 
32-bit fields there are 224 bits of data available which seems quite generous 
and would be simple to read. As it stands ld.so carries around a pointer to and 
quantity of the program headers which means the data directly present in the 
headers is exceptionally easy to read, a segment is harder. (I hope I have the 
right terminology there with 'segment').

Regards,
Matthew



RE: [RFC, MIPS] Relax NaN rules

2014-03-23 Thread Matthew Fortune
Maciej W. Rozycki  writes:
> On Sat, 22 Mar 2014, Richard Sandiford wrote:
> 
> > > Thanks Joseph. I guess I'm not really pushing to have don't-care
> > > supported as it would take a lot of effort to determine when code
> > > does and does not care, you rightly point out more cases to deal
> > > with too. I'm not sure if the benefit would then be worth it or not
> > > as there would still be modules which do and do not care about old
> > > and new NaNs so it doesn't really relieve any pressure on toolchains
> > > or linux distributions. The second part of the proposal is more
> > > interesting/useful as it is saying I don't care about the impact of
> > > getting NaN encoding wrong and a tools vendor/linux distribution
> > > then gets to make that choice. Any comments on that aspect?
> >
> > Maybe it's just me, but I don't understand your use case for (2).
> > If 99% of users don't care about the different NaN encodings then why
> > would they use a different -mnan setting from the default?

MSA requires nan2008. The use case for (2) is getting MSA used in any of the
pre-built or carefully controlled linux distributions that cannot tolerate
multilib style root file systems and do not accept new built variants easily.
(All the topics I am currently working on are about removing the need for
multilibs to make use of new architecture features, the origin of O32 FR
interlinking is MSA requiring FP64).

> > Are you worried about potential future processors that only support
> > 2008 NaNs?  If so, then (a) you give good reasons why that seems like
> > a misfeature and (b) I think we should instead tackle it by continuing
> > to allow both -mnan settings for all processors.  I.e. we simply
> > wouldn't add any logic to check "is this processor/architecture
> > compatible with this encoding?".
> 
>  Such processors already exist I believe.  Matthew will fill you in on
> the details, but IIRC they are strapped at boot to either NaN mode that
> cannot be switched at the run time, i.e. via CP1.FCSR (the NAN2008 bit
> is fixed at 0 or 1 depending on the boot mode selected).

The 'proaptiv' is the only core I am aware of that has both modes in hardware,
I'm afraid I don't know whether this is boot time only or runtime configurable
I have heard both and don't remember what is actually in the core.

> Of course that
> means they can still run legacy NaN code if strapped for the legacy NaN
> mode, but it's up to the SOC/board hardware designer to decide which
> mode to choose and we have no control over that.

Indeed and this could pose a problem for the pre-built distros as described
earlier depending on how the nan rules are implemented.

> 
>  I feel uneasy about silently producing wrong results, even if it's only
> such a border case as for many the NaN is.  Perhaps switching the kernel
> into a full FP emulation mode for programs that have an unsupported NaN
> requirement would be an option?

The issue here is that pre-existing binary distros are nan1985 which would
mean crippling the performance of the latest cores.

> That should be doable with a reasonable
> effort and wouldn't affect programs that do not use the FPU at all, so
> no need to rush rebuilding everything, just the FP-intensive software
> packages.

This unfortunately still leaves us with a necessity to have new build
variants which although not impossible is also not likely to happen in
any sensible time frame. The compiler/tools engineer in me completely
agrees with the position that nan encoding differences should be accounted
for but there are scenarios that the cost of doing this outweighs the
benefits I think.

A couple of ideas to address some of the various concerns:

1) As per my original proposal of allowing the tools to be built in a mode
   that ignores nan encoding... but add a special symbol introduced by
   the linker which the fenv code could use to warn when enabling exceptions
   that could trigger incorrectly.
2) Allow MSA to be built with nan1985 even though the hardware will always
   require nan2008. (This is really just another way of masking the problem
   but is possibly even less safe as the objects would have no record of the
   'correct' encoding requirement.

Please bear in mind that I am only trying to create a build option that
vendors can use if that is the only alternative available to them. There are
environments that can cope better with the nan encoding checking although
even those environments would benefit from being able to mark objects as
nan-independent (gnu_attributes would have worked well for this). The easiest
example of something that could cope better would be an RTOS where the RTOS
itself shouldn't have to be built twice for nan encoding but the user tasks
could easily be built using a consistent nan encoding.

Regards,
Matthew


RE: [RFC, MIPS] Relax NaN rules

2014-03-24 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> > Maciej W. Rozycki  writes:
> >> On Sat, 22 Mar 2014, Richard Sandiford wrote:
> >>
> >> > > Thanks Joseph. I guess I'm not really pushing to have don't-care
> >> > > supported as it would take a lot of effort to determine when code
> >> > > does and does not care, you rightly point out more cases to deal
> >> > > with too. I'm not sure if the benefit would then be worth it or
> >> > > not as there would still be modules which do and do not care
> >> > > about old and new NaNs so it doesn't really relieve any pressure
> >> > > on toolchains or linux distributions. The second part of the
> >> > > proposal is more interesting/useful as it is saying I don't care
> >> > > about the impact of getting NaN encoding wrong and a tools
> >> > > vendor/linux distribution then gets to make that choice. Any
> comments on that aspect?
> >> >
> >> > Maybe it's just me, but I don't understand your use case for (2).
> >> > If 99% of users don't care about the different NaN encodings then
> >> > why would they use a different -mnan setting from the default?
> >
> > MSA requires nan2008.
> 
> Ah, OK.
> 
> > A couple of ideas to address some of the various concerns:
> >
> > 1) As per my original proposal of allowing the tools to be built in a
> mode
> >that ignores nan encoding... but add a special symbol introduced by
> >the linker which the fenv code could use to warn when enabling
> exceptions
> >that could trigger incorrectly.
> 
> The problem with a third mode is that, as discussed upthread, there's no
> way in general to tell whether a given bit of code cares about sNaNs or
> not.
> So the third mode is really just an assertion by the user that NaN
> encodings don't matter.  I think in that case a separate mode is only
> useful if:
> 
> (a) the entire single-variant base system is built in don't-care mode
> (a bit like it would be built with -mfpxx, which is what makes -
> mfpxx
> useful).  But is it really feasible for the system to be completely
> agnostic about the NaN encoding?  I assume any long double emulation
> code (if using the normal form of n32/64) and any float parsing code
> would then need to look at the FCSR before generating a NaN.
> NaNs in C initialisers would be disallowed.  Etc.  These are rules
> that would need to be followed throughout the codebase, even the
> target-independent parts.  Would a portable codebase really be
> willing
> to accept that for a MIPS oddity?
> 
> If instead some routines in some system libraries assume a
> particular NaN
> encoding but the library binaries themselves claim to be "don't
> care"
> (on the basis that everything is OK if you avoid those routines)
> then we lose the ability to say that using a "don't care" library
> will not in itself cause your application to depend on NaN
> encodings.
> And at that point any automatic rules based on the library-level
> markup
> lose their value.  Also, without a guarantee like that, it becomes
> very
> hard for a user to know whether they will be affected by that
> encoding
> or not.
> 
> and
> (b) the user has to explicitly say that they don't care, rather than it
> being implied by things like -mmsa.  I think you wanted it the other
> way around, with "don't care" being the default and the user having
> to say explicitly that they care.
> 
> Otherwise, all the third mode does is cause the system to reject any
> binary that is careful enough to say that it wants one encoding over
> another and relies on no feature that would stop it running on the
> processor otherwise.  (This excludes MSA-only binaries, for example,
> since they would be rejected on non-MSA systems anyway.)
> 
> Without (a) and (b) it seems like a lot of complication for a very
> narrow use case.  If we have a system built for one encoding and a
> processor that uses another then in some ways...

I don't think I was clear enough here, I agree with your points above about
what would be necessary to support a 3rd 'I don't care' mode. I don't think
this effort is really worth it either

I am more interested in giving tools builders the option to produce tools
that just ignore NAN2008 ELF flags (as if all the work on tracking nan
encodings were never done). I know this leaves a sour taste as we are
allowing inconsistenc

RE: [RFC, MIPS] Relax NaN rules

2014-03-25 Thread Matthew Fortune
Joseph Myers  writes:
> On Tue, 25 Mar 2014, Rich Fuhler wrote:
> 
> > Hi Richard, we talked about (a.) originally - it was the design of the
> > libraries. Joseph, as I recollect, you raised language issues with
> > requirements for compile-time constant values for NaNs. Would you
> > accept a non-constant NaN implementation in glibc? Basically, I would
> > envision
> > __builtin_nan("") to be 0.0/0.0. Probably not a problem for C++ or
> > most code.
> 
> 0.0/0.0 is not correct for NaN when used outside static initializers,
> because it raises INVALID when doing the division at runtime.  Raising
> INVALID when not wanted is exactly the same problem you'll get if you
> mix code that uses different NaN conventions but doesn't care about
> signaling NaNs per se, so 0.0/0.0 doesn't help there.  And in static
> initializers it doesn't help at all because then it will get folded to
> the same NaN bit-pattern as __builtin_nan("").  So you don't gain
> anything from such a change.  Literally disallowing NaN static
> initializers brings you into the realm of weird non-IEEE floating-point
> configurations we should not be adding support for in glibc.
> 
> If you want to use code built for the new NaN convention without
> requiring libraries built for that convention (and are willing to risk
> random INVALID exceptions as NaNs get passed between the conventions),
> as already stated the obvious approach is a command-line option or
> combination of options meaning "build code for this convention, but use
> the other convention when marking object files and choosing libraries
> and a dynamic linker".

Can you envisage any way of us raising a warning/error if INVALID 
exceptions get enabled in this hybrid NaN world? I believe that is the 
only major problem area with mixing NaNs. I.e. It should be possible to 
introduce a magic symbol if LD merged opposing NaN modules and have 
fesetexcept check for it but I can't think up a way to indicate to 
fesetexcept if LDSO linked opposing NaN modules (I'm not sufficiently 
experienced with dynamic linkers to know what is possible though).

> What's the status of the Linux kernel support for the new NaN
> convention, incidentally?  glibc built for the new convention sets
> arch_minimum_kernel=10.0.0 until the support is in kernel.org and so the
> value can be set to the actual first kernel with the support
> (arch_minimum_kernel=10.0.0 may be used in future for any other cases
> where glibc support for a port or port variant gets in before the kernel
> support.)

Kernel support is queued up against a significant number of other features 
that are being actively worked on. I do know that it is on the list just 
not necessarily close to the top just yet. Given the chicken and egg 
situation we have in terms of enabling nan2008 glibc from a specific 
kernel version then this of course has to be done before any glibc with 
nan2008 support can even be considered by a distribution.

Regards,
Matthew


RE: [RFC, MIPS] Relax NaN rules

2014-03-26 Thread Matthew Fortune
Joseph Myers  writes:
> On Tue, 25 Mar 2014, Matthew Fortune wrote:
> 
> > Can you envisage any way of us raising a warning/error if INVALID
> > exceptions get enabled in this hybrid NaN world? I believe that is the
> > only major problem area with mixing NaNs. I.e. It should be possible
> > to introduce a magic symbol if LD merged opposing NaN modules and have
> > fesetexcept check for it but I can't think up a way to indicate to
> > fesetexcept if LDSO linked opposing NaN modules (I'm not sufficiently
> > experienced with dynamic linkers to know what is possible though).
> 
> fesetexcept - a new function in TS 18661-1, not implemented in glibc -
> has nothing to do with this; the issue is about exceptions from
> arithmetic rather than those from functions explicitly setting them.
> And to a first approximation you can ignore exceptions being "enabled"
> (i.e. traps being
> enabled) and presume they are tested for by C99 facilities such as
> fetestexcept - which has no way to return any sort of error status.

Sorry Joseph I was being retarded, my brain thought one thing and fingers
typed another. I meant to refer to feenableexcept. I was thinking more
along the lines of having such things as feenableexcept and fetestexcept
simply abort (or at least be noisy) if they were called to check for/trap
for INVALID exceptions from an application that included both NaN
encodings. This too is probably overkill as INVALID has many more reasons
to be triggered than sNaN and the abort/warning would happen for those
cases too.

I am warming to the combined ideas of Richard and Maciej for allowing
MSA to be compiled with the 'wrong' NaN encoding and also supporting a
kernel mode to do full FPU emulation when software and hardware NaN
encodings differ. The issue though is that the full emulation effectively
renders MSA useless as it can only exist in NaN2008 cores anyway and that
would be the trigger case for emulation. This then removes one of the
primary reasons for needing mixed nan support in the first place so the
kernel emulation option may just have to be used at a user's discretion.

I'll talk to the engineers working on Linux distributions and see what
they think. I really appreciate all the discussion on this topic.

Regards,
Matthew


HARD_REGNO_CALL_PART_CLOBBERED and regs_invalidated_by_call

2014-04-02 Thread Matthew Fortune
Hi Richard,

As part of implementing the new O32 FPXX ABI I am making use of the
HARD_REGNO_CALL_PART_CLOBBERED macro to allow odd-numbered floating-point 
registers to be considered as 'normally' callee-saved but call clobbered if 
they 
are being used to hold SImode or SFmode data. The macro is implemented as:

/* Odd numbered single precision registers are not considered call saved
   for O32 FPXX as they will be clobbered when run on an FR=1 FPU.  */
#define HARD_REGNO_CALL_PART_CLOBBERED(REGNO, MODE) \
  (TARGET_FLOATXX && ((MODE) == SFmode || (MODE) == SImode) \
   && FP_REG_P (REGNO) && (REGNO & 1))

IRA and LRA appear to work correctly w.r.t. HARD_REGNO_CALL_PART_CLOBBERED and 
I 
get the desired O32 FPXX ABI behaviour. However when writing a number of tests 
for this I triggered some optimisations (in particular regcprop) which ignored 
the fact that the odd-numbered single-precision registers are clobbered across 
calls and essentially undid the work IRA/LRA did in treating the register as 
clobbered. The reason for regcprop ignoring the call-clobbered nature of these 
registers is that it simply does not check. The test for call-clobbered 
registers solely relies on regs_invalidated_by_call which is (and cannot be) 
aware of the HARD_REGNO_CALL_PART_CLOBBERED macro as it has no information about
what mode registers are in when it is used. A proposed fix is inline below for
this specific issue.

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 101de76..cb2937c 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -1030,8 +1030,10 @@ copyprop_hardreg_forward_1 (basic_block bb, struct 
value_data *vd)
}
}

- EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call, 0, regno, 
hrsi)
-   if (regno < set_regno || regno >= set_regno + set_nregs)
+ for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+   if ((TEST_HARD_REG_BIT (regs_invalidated_by_call, regno)
+|| HARD_REGNO_CALL_PART_CLOBBERED (regno, vd->e[regno].mode))
+   && (regno < set_regno || regno >= set_regno + set_nregs))
  kill_value_regno (regno, 1, vd);

  /* If SET was seen in CALL_INSN_FUNCTION_USAGE, and SET_SRC

The problem is that there are several other passes that solely rely on 
regs_invalidated_by_call to determine call-clobbered status and will therefore 
make the mistake. Some of these passes simply don't have mode information 
around 
when handling call-clobbered registers which leaves me a little unsure of the 
best solution in those cases. I would expect that being over-cautious and 
always 
marking a potentially clobbered register as clobbered seems like one option but 
there is a risk that doing so could lead to legitimate use of a callee-saved 
register (in a mode that is not part clobbered) to be broken.  Essentially I 
would propose introducing another register set 'regs_maybe_invalidated_by_call' 
that includes all reg_invalidated_by_call and anything 
HARD_REGNO_CALL_PART_CLOBBERED reports true for when checking all registers 
against all modes. Wherever call-clobbered information is required but mode 
information is unavailable then regs_maybe_invalidated_by_call would then be 
used.  As I said though there are probably some corner cases to handle too.

I don't quite have the O32 FPXX patches ready to send out yet but this issue is 
relevant to all architectures using HARD_REGNO_CALL_PART_CLOBBERED, presumably 
nobody has hit it yet though.

Regards,
Matthew


RE: HARD_REGNO_CALL_PART_CLOBBERED and regs_invalidated_by_call

2014-04-05 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> > Hi Richard,
> >
> > As part of implementing the new O32 FPXX ABI I am making use of the
> > HARD_REGNO_CALL_PART_CLOBBERED macro to allow odd-numbered
> > floating-point registers to be considered as 'normally' callee-saved
> > but call clobbered if they are being used to hold SImode or SFmode
> > data. The macro is implemented as:
> >
> > /* Odd numbered single precision registers are not considered call
> saved
> >for O32 FPXX as they will be clobbered when run on an FR=1 FPU.
> */
> > #define HARD_REGNO_CALL_PART_CLOBBERED(REGNO, MODE)
> \
> >   (TARGET_FLOATXX && ((MODE) == SFmode || (MODE) == SImode)
> \
> >&& FP_REG_P (REGNO) && (REGNO & 1))
> 
> Under these conditions the entire value is call-clobbered though.
> It might be better to say that the odd-numbered registers are always
> call-clobbered (which I think is more accurate) but force them to be
> saved by functions that use them.  This is in some ways similar to the
> way that interrupt handlers save call-clobbered registers.

The problem here is that these registers form the second part of 64-bit 
call saved registers and therefore have to be marked as call-saved for 
64-bit values regardless of their behaviour for 32-bit values.  I have 
tried to simply mark them as call-clobbered and the effect was that 
64-bit values ended up being seen as call-clobbered.  Approaching this 
problem from both sides leads to complications as O32 FPXX has a register
which behaves differently based on mode. From experimentation I believe 
this approach to be the neatest and use existing features of GCC.

> Maybe some RA heuristics will need tweaking to reflect the extra cost
> of these registers, but I imagine that's true either way.

Perhaps. I am currently thinking/hoping that simply having them marked 
as call clobbered will be enough.

In terms of the HARD_REGNO_CALL_PART_CLOBBERED macro. I would say that 
it is poorly named as it gives no indication to GCC internals as to 
which part of the register is clobbered and which is not. When this macro 
returns true GCC simply takes that to mean the whole register is 
clobbered. I'd say that then makes my usage legitimate.  If I had time 
and will power I'd remove the 'PART' from this macro and switch over to 
having it as the primary source of call-clobbered information in GCC 
doing away with call_used_regs etc. Having the mode information is quite 
valuable as we are seeing with SIMD ISA extensions that use this macro. 
Supporting two sources of call-clobbered data is probably what lead us to 
the current situation of broken optimisation passes.

My concern with this post is not really to focus on my specific usage of 
HARD_REGNO_CALL_PART_CLOBBERED but instead the issues with the internals 
of GCC not taking account of it.  I would expect with a little work I 
could show this same optimisation problem for some pre-existing use of the 
macro.  Do you agree with my brief analysis that at least one pass 
(regcprop) is wrong w.r.t this macro? I should be ready to send the FPXX 
GCC patch next week which will show the problem directly.

Regards,
Matthew 


RE: HARD_REGNO_CALL_PART_CLOBBERED and regs_invalidated_by_call

2014-04-07 Thread Matthew Fortune
Richard Sandiford  writes:
> Matthew Fortune  writes:
> > Richard Sandiford  writes:
> >> Matthew Fortune  writes:
> >> Maybe some RA heuristics will need tweaking to reflect the extra
> cost
> >> of these registers, but I imagine that's true either way.
> >
> > Perhaps. I am currently thinking/hoping that simply having them
> marked
> > as call clobbered will be enough.
> 
> FWIW, I have some patches queued for stage 1 that tell the target-
> independent code which registers are saved by the current function.
> This makes things like interrupt functions less magical.
> Maybe it would help with exposing the "call-clobbered and call-saved"
> thing to RA.

That could be useful if existing costs don't lead to the correct
registers being targeted first.
 
> > In terms of the HARD_REGNO_CALL_PART_CLOBBERED macro. I would say
> that
> > it is poorly named as it gives no indication to GCC internals as to
> > which part of the register is clobbered and which is not.
> 
> I don't think it was really supposed to matter.  (Not that I'm
> defending the name. :-))
> 
> > When this macro
> > returns true GCC simply takes that to mean the whole register is
> > clobbered. I'd say that then makes my usage legitimate. If I had time
> > and will power I'd remove the 'PART' from this macro and switch over
> > to having it as the primary source of call-clobbered information in
> > GCC doing away with call_used_regs etc. Having the mode information
> is
> > quite valuable as we are seeing with SIMD ISA extensions that use
> this macro.
> > Supporting two sources of call-clobbered data is probably what lead
> us
> > to the current situation of broken optimisation passes.
> 
> Definitely agree with the last part.  But I think it'd be better to fix
> it the other way: get rid of the existing
> HARD_REGNO_CALL_PART_CLOBBERED uses in the generic code and replace
> things like call_used_regs with mode-indexed HARD_REG_SETs of the call-
> clobbered registers.  You could then add a cleaner target interface
> that says whether a given register is call-clobbered in a given mode.
> The default could use the existing HARD_REGNO_CALL_PART_CLOBBERED,
> CALL_USED_REGISTERS and CALL_REALLY_USED_REGISTERS (another horrible
> part of the interface).

Indeed and this could potentially capture information on which part of a
register is clobbered though it is difficult to imagine what to then do
with such information!

> Doing it that way wouldn't involve any changes to port-specific code
> other than MIPS.  Other ports could use the default implementation and
> switch to the new interface later.
> 
> I realise that might be more work than you were planning on, but if
> you're having to change existing passes anyway then we might as well
> fix it in a way that reduces the number of places that need to check
> two things instead of one.

I have an urgent need to get all this working so it may simply have to
Include whatever changes are necessary.

> All IMO of course.  I don't maintain this part of GCC.

It's not clear who would be best to talk to about this. I've added Jeff
as he appears to be listed against some mostly relevant areas in the
maintainers file. If you could point me towards anyone more suitable
that would be good.

Regards,
Matthew


RE: HARD_REGNO_CALL_PART_CLOBBERED and regs_invalidated_by_call

2014-04-07 Thread Matthew Fortune
Actually add Jeff this time...

> -Original Message-
> From: Matthew Fortune
> Sent: 07 April 2014 09:07
> To: 'Richard Sandiford'
> Cc: gcc@gcc.gnu.org
> Subject: RE: HARD_REGNO_CALL_PART_CLOBBERED and regs_invalidated_by_call
> 
> Richard Sandiford  writes:
> > Matthew Fortune  writes:
> > > Richard Sandiford  writes:
> > >> Matthew Fortune  writes:
> > >> Maybe some RA heuristics will need tweaking to reflect the extra
> > cost
> > >> of these registers, but I imagine that's true either way.
> > >
> > > Perhaps. I am currently thinking/hoping that simply having them
> > marked
> > > as call clobbered will be enough.
> >
> > FWIW, I have some patches queued for stage 1 that tell the target-
> > independent code which registers are saved by the current function.
> > This makes things like interrupt functions less magical.
> > Maybe it would help with exposing the "call-clobbered and call-saved"
> > thing to RA.
> 
> That could be useful if existing costs don't lead to the correct
> registers being targeted first.
> 
> > > In terms of the HARD_REGNO_CALL_PART_CLOBBERED macro. I would say
> > that
> > > it is poorly named as it gives no indication to GCC internals as to
> > > which part of the register is clobbered and which is not.
> >
> > I don't think it was really supposed to matter.  (Not that I'm
> > defending the name. :-))
> >
> > > When this macro
> > > returns true GCC simply takes that to mean the whole register is
> > > clobbered. I'd say that then makes my usage legitimate. If I had
> > > time and will power I'd remove the 'PART' from this macro and switch
> > > over to having it as the primary source of call-clobbered
> > > information in GCC doing away with call_used_regs etc. Having the
> > > mode information
> > is
> > > quite valuable as we are seeing with SIMD ISA extensions that use
> > this macro.
> > > Supporting two sources of call-clobbered data is probably what lead
> > us
> > > to the current situation of broken optimisation passes.
> >
> > Definitely agree with the last part.  But I think it'd be better to
> > fix it the other way: get rid of the existing
> > HARD_REGNO_CALL_PART_CLOBBERED uses in the generic code and replace
> > things like call_used_regs with mode-indexed HARD_REG_SETs of the
> > call- clobbered registers.  You could then add a cleaner target
> > interface that says whether a given register is call-clobbered in a
> given mode.
> > The default could use the existing HARD_REGNO_CALL_PART_CLOBBERED,
> > CALL_USED_REGISTERS and CALL_REALLY_USED_REGISTERS (another horrible
> > part of the interface).
> 
> Indeed and this could potentially capture information on which part of a
> register is clobbered though it is difficult to imagine what to then do
> with such information!
> 
> > Doing it that way wouldn't involve any changes to port-specific code
> > other than MIPS.  Other ports could use the default implementation and
> > switch to the new interface later.
> >
> > I realise that might be more work than you were planning on, but if
> > you're having to change existing passes anyway then we might as well
> > fix it in a way that reduces the number of places that need to check
> > two things instead of one.
> 
> I have an urgent need to get all this working so it may simply have to
> Include whatever changes are necessary.
> 
> > All IMO of course.  I don't maintain this part of GCC.
> 
> It's not clear who would be best to talk to about this. I've added Jeff
> as he appears to be listed against some mostly relevant areas in the
> maintainers file. If you could point me towards anyone more suitable
> that would be good.
> 
> Regards,
> Matthew