Re: GCC 5.1.1 Status Report (2015-06-22)

2015-07-01 Thread Richard Biener
On Tue, 30 Jun 2015, Jason Merrill wrote:

> I'm interested in your thoughts on fixing c++/65945 in 5.2.
> 
> It's trivial to fix the alignment of nullptr_t, but I was concerned about ABI
> impact.  On further research it seems that it won't cause any trouble with
> argument alignment, since that doesn't seem to rely on TYPE_ALIGN at all; I
> think the only ABI breakage would come from unaligned nullptr_t fields in
> classes, which I expect to be very rare. The testcases that were breaking on
> SPARC and ARM without this fix have to do with local stack slots, which are
> not part of an interface.
> 
> So I think we can change this without breaking a significant amount of code,
> and better to break it now than after we've settled into the new library ABI.
> We should certainly mention it prominently in the release notes if we do, and
> I've added a -Wabi warning for the field alignment change.
> 
> Does this make sense to you?

Yes, that makes sense to me.

Richard.

> Jason
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: pa indirect_jump instruction

2015-07-01 Thread Trevor Saunders
On Tue, Jun 30, 2015 at 09:53:31PM +0100, Richard Sandiford wrote:
> I have a series of patches to convert all non-optab instructions to
> the target-insns.def interface.  config-list.mk showed up one problem
> though.  The pa indirect_jump pattern is:
> 
> ;;; Hope this is only within a function...
> (define_insn "indirect_jump"
>   [(set (pc) (match_operand 0 "register_operand" "r"))]
>   "GET_MODE (operands[0]) == word_mode"
>   "bv%* %%r0(%0)"
>   [(set_attr "type" "branch")
>(set_attr "length" "4")])
> 
> so the C condition depends on operands[], which isn't usually allowed
> for named patterns.  We get away with it at the moment because we only
> test for the existence of HAVE_indirect_jump, not its value:

yeah, I hit this a while ago and filed bug 66114.  It looks like I had
trouble with fr30 too, is that fixed now?

Trev



Re: Code size issues on FP-emulation on libgcc compared to LLVM's compiler_rt

2015-07-01 Thread Zinovy Nis
Had anyone a chance to compare FP implementation in compiler_rt? I
still wonder why the sizes differ so much, Incomplete implementation
in compiler_rt?
compiler_rt claims it is IEEE-compliant.

2015-06-30 23:10 GMT+03:00 Joseph Myers :
> On Tue, 30 Jun 2015, H.J. Lu wrote:
>
>> > soft-fp is expected to be used on 32-bit and 64-bit systems for which a
>> > few kB code size is insignificant.
>>
>> Size is very important for IA MCU.  Would it be acceptable to update
>> soft-fp to optimize for size with
>>
>> #ifdef __OPTIMIZE_SIZE__
>> #else
>> #endif
>
> I don't think there's any low-hanging fruit for size optimization.  If you
> wanted to save that 6 or 7 kB (total, across all the float and double code
> in libgcc, as compared to fp-bit or ieeelib and mentioned in the Summit
> paper) you'd structure the library completely differently, making no
> attempt to support rounding modes, exceptions, signs of NaNs, choice of
> NaN results or any particular choice for anything not specified in IEEE,
> and using common functions whenever appropriate for things such as
> unpacking / packing, or shared addition / subtraction code, instead of
> inlining everything with macros.  The result would be a completely
> different library design that wouldn't have anything much to share with
> soft-fp.  Indeed, such a library might best be written in assembly code
> for each architecture (much like the existing code for ARM in libgcc).
>
> It would not surprise me if carefully examining the code generated for
> soft-fp functions (possibly the final assembly code, possibly the
> optimized GIMPLE) would show up scope for a few microoptimizations in GCC
> where it could optimize the soft-fp code better, but I expect the effects
> of such microoptimizations would be fairly small.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Vladimir Makarov



On 06/30/2015 05:37 PM, Jakub Jelinek wrote:

On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:

I'm working on a massive set of cleanups to Linux's syscall handling.
We currently have a nasty optimization in which we don't save rbx,
rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
This works, but it makes the code a huge mess.  I'd rather save all
regs in asm and then call C code.

Unfortunately, this will add five cycles (on SNB) to one of the
hottest paths in the kernel.  To counteract it, I have a gcc feature
request that might not be all that crazy.  When writing C functions
intended to be called from asm, what if we could do:

__attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
"r15"))) void func(void);

This will save enough pushes and pops that it could easily give us our
five cycles back and then some.  It's also easy to be compatible with
old GCC versions -- we could just omit the attribute, since preserving
a register is always safe.

Thoughts?  Is this totally crazy?  Is it easy to implement?

(I'm not necessarily suggesting that we do this for the syscall bodies
themselves.  I want to do it for the entry and exit helpers, so we'd
still lose the five cycles in the full fast-path case, but we'd do
better in the slower paths, and the slower paths are becoming
increasingly important in real workloads.)

GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
options, which allow to tweak the calling conventions; but it is per
translation unit right now.  It isn't clear which of these options
you mean with the extra_clobber.
I assume you are looking for a possibility to change this to be
per-function, with caller with a different calling convention having to
adjust for different ABI callee.  To some extent, recent GCC versions
do that automatically with -fipa-ra already - if some call used registers
are not clobbered by some call and the caller can analyze that callee,
it can stick values in such registers across the call.
I'd say the most natural API for this would be to allow
f{fixed,call-{used,saved}}-REG in target attribute.


One consequence of frequent changing calling convention per function or 
register usage could be GCC slowdown.  RA calculates too many data and 
it requires a lot of time to recalculate them after something in the 
register usage convention is changed.


Another consequence would be that RA fails generate the code in some 
cases and even worse the failure might depend on version of GCC (I 
already saw PRs where RA worked for an asm in one GCC version because a 
pseudo was changed by equivalent constant and failed in another GCC 
version where it did not happen).


Other than that I don't see other complications with implementing such 
feature.




Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Andy Lutomirski
On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov  wrote:
>
>
> On 06/30/2015 05:37 PM, Jakub Jelinek wrote:
>>
>> On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
>>>
>>> I'm working on a massive set of cleanups to Linux's syscall handling.
>>> We currently have a nasty optimization in which we don't save rbx,
>>> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
>>> This works, but it makes the code a huge mess.  I'd rather save all
>>> regs in asm and then call C code.
>>>
>>> Unfortunately, this will add five cycles (on SNB) to one of the
>>> hottest paths in the kernel.  To counteract it, I have a gcc feature
>>> request that might not be all that crazy.  When writing C functions
>>> intended to be called from asm, what if we could do:
>>>
>>> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
>>> "r15"))) void func(void);
>>>
>>> This will save enough pushes and pops that it could easily give us our
>>> five cycles back and then some.  It's also easy to be compatible with
>>> old GCC versions -- we could just omit the attribute, since preserving
>>> a register is always safe.
>>>
>>> Thoughts?  Is this totally crazy?  Is it easy to implement?
>>>
>>> (I'm not necessarily suggesting that we do this for the syscall bodies
>>> themselves.  I want to do it for the entry and exit helpers, so we'd
>>> still lose the five cycles in the full fast-path case, but we'd do
>>> better in the slower paths, and the slower paths are becoming
>>> increasingly important in real workloads.)
>>
>> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
>> options, which allow to tweak the calling conventions; but it is per
>> translation unit right now.  It isn't clear which of these options
>> you mean with the extra_clobber.
>> I assume you are looking for a possibility to change this to be
>> per-function, with caller with a different calling convention having to
>> adjust for different ABI callee.  To some extent, recent GCC versions
>> do that automatically with -fipa-ra already - if some call used registers
>> are not clobbered by some call and the caller can analyze that callee,
>> it can stick values in such registers across the call.
>> I'd say the most natural API for this would be to allow
>> f{fixed,call-{used,saved}}-REG in target attribute.
>>
>>
> One consequence of frequent changing calling convention per function or
> register usage could be GCC slowdown.  RA calculates too many data and it
> requires a lot of time to recalculate them after something in the register
> usage convention is changed.

Do you mean that RA precalculates things based on the calling
convention and saves it across functions?  Hmm.  I don't think this
would be a big problem in my intended use case -- there would only be
a handful of functions using this extension, and they'd have very few
non-asm callers.

>
> Another consequence would be that RA fails generate the code in some cases
> and even worse the failure might depend on version of GCC (I already saw PRs
> where RA worked for an asm in one GCC version because a pseudo was changed
> by equivalent constant and failed in another GCC version where it did not
> happen).
>

Would this be a problem generating code for a function with extra
"used" regs or just a problem generating code to call such a function.
I imagine that, in the former case, RA's job would be easier, not
harder, since there would be more registers to work with.  In
practice, though, I think it would just end up changing the prologue
and epilogue.

--Andy


Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Jakub Jelinek
On Wed, Jul 01, 2015 at 11:23:17AM -0400, Vladimir Makarov wrote:
> >>(I'm not necessarily suggesting that we do this for the syscall bodies
> >>themselves.  I want to do it for the entry and exit helpers, so we'd
> >>still lose the five cycles in the full fast-path case, but we'd do
> >>better in the slower paths, and the slower paths are becoming
> >>increasingly important in real workloads.)
> >GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
> >options, which allow to tweak the calling conventions; but it is per
> >translation unit right now.  It isn't clear which of these options
> >you mean with the extra_clobber.
> >I assume you are looking for a possibility to change this to be
> >per-function, with caller with a different calling convention having to
> >adjust for different ABI callee.  To some extent, recent GCC versions
> >do that automatically with -fipa-ra already - if some call used registers
> >are not clobbered by some call and the caller can analyze that callee,
> >it can stick values in such registers across the call.
> >I'd say the most natural API for this would be to allow
> >f{fixed,call-{used,saved}}-REG in target attribute.
> >
> >
> One consequence of frequent changing calling convention per function or
> register usage could be GCC slowdown.  RA calculates too many data and it
> requires a lot of time to recalculate them after something in the register
> usage convention is changed.

That is true.  i?86/x86_64 is a switchable target, so at least for the case
of info computed for the callee with non-standard calling convention such
info can be computed just once when the function with such a target
attribute would be seen first.  But for the caller side, I agree not
everything can be precomputed, if we can't use e.g. regsets saved in the
callee; as a single function can call different functions with different
ABIs.  But to some extent we have that already with -fipa-ra, don't we?

Jakub


Re: Code size issues on FP-emulation on libgcc compared to LLVM's compiler_rt

2015-07-01 Thread Zinovy Nis
The only idea on size difference I have is:

headers text in many of FP-emulation files from compiler_rt contains lines like:

// This file implements quad-precision soft-float addition ***with the
IEEE-754 default rounding*** (to nearest, ties to even).


2015-07-01 16:59 GMT+03:00 Zinovy Nis :
> Had anyone a chance to compare FP implementation in compiler_rt? I
> still wonder why the sizes differ so much, Incomplete implementation
> in compiler_rt?
> compiler_rt claims it is IEEE-compliant.
>
> 2015-06-30 23:10 GMT+03:00 Joseph Myers :
>> On Tue, 30 Jun 2015, H.J. Lu wrote:
>>
>>> > soft-fp is expected to be used on 32-bit and 64-bit systems for which a
>>> > few kB code size is insignificant.
>>>
>>> Size is very important for IA MCU.  Would it be acceptable to update
>>> soft-fp to optimize for size with
>>>
>>> #ifdef __OPTIMIZE_SIZE__
>>> #else
>>> #endif
>>
>> I don't think there's any low-hanging fruit for size optimization.  If you
>> wanted to save that 6 or 7 kB (total, across all the float and double code
>> in libgcc, as compared to fp-bit or ieeelib and mentioned in the Summit
>> paper) you'd structure the library completely differently, making no
>> attempt to support rounding modes, exceptions, signs of NaNs, choice of
>> NaN results or any particular choice for anything not specified in IEEE,
>> and using common functions whenever appropriate for things such as
>> unpacking / packing, or shared addition / subtraction code, instead of
>> inlining everything with macros.  The result would be a completely
>> different library design that wouldn't have anything much to share with
>> soft-fp.  Indeed, such a library might best be written in assembly code
>> for each architecture (much like the existing code for ARM in libgcc).
>>
>> It would not surprise me if carefully examining the code generated for
>> soft-fp functions (possibly the final assembly code, possibly the
>> optimized GIMPLE) would show up scope for a few microoptimizations in GCC
>> where it could optimize the soft-fp code better, but I expect the effects
>> of such microoptimizations would be fairly small.
>>
>> --
>> Joseph S. Myers
>> jos...@codesourcery.com


Re: Code size issues on FP-emulation on libgcc compared to LLVM's compiler_rt

2015-07-01 Thread Szabolcs Nagy
On 01/07/15 16:34, Zinovy Nis wrote:
> The only idea on size difference I have is:
> 
> headers text in many of FP-emulation files from compiler_rt contains lines 
> like:
> 
> // This file implements quad-precision soft-float addition ***with the
> IEEE-754 default rounding*** (to nearest, ties to even).
> 

nearest rounding and no exception flags.

in other words they assume no fenv access.



CFCs

2015-07-01 Thread Mark Grange


Sent from my iPhone


Re: Code size issues on FP-emulation on libgcc compared to LLVM's compiler_rt

2015-07-01 Thread Joseph Myers
On Wed, 1 Jul 2015, Zinovy Nis wrote:

> Had anyone a chance to compare FP implementation in compiler_rt? I
> still wonder why the sizes differ so much, Incomplete implementation
> in compiler_rt?
> compiler_rt claims it is IEEE-compliant.

If you examine the implementation approaches, you will see that apart from 
the compiler_rt code not being set up for rounding mode and exceptions 
support (and in some cases, it can be hard to completely optimize generic 
code as much as code that never has to consider those issues), it (for 
addition) does normalization in one place, and swaps the arguments in one 
place so as to know which has the larger magnitude, whereas soft-fp tries 
to reduce the amount of processing in each case by only normalizing when 
and to the extent needed and duplicating code for each choice of which 
argument has the larger exponent (and having separate code for the case of 
equal exponents).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Vladimir Makarov



On 07/01/2015 11:31 AM, Jakub Jelinek wrote:

On Wed, Jul 01, 2015 at 11:23:17AM -0400, Vladimir Makarov wrote:

(I'm not necessarily suggesting that we do this for the syscall bodies
themselves.  I want to do it for the entry and exit helpers, so we'd
still lose the five cycles in the full fast-path case, but we'd do
better in the slower paths, and the slower paths are becoming
increasingly important in real workloads.)

GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
options, which allow to tweak the calling conventions; but it is per
translation unit right now.  It isn't clear which of these options
you mean with the extra_clobber.
I assume you are looking for a possibility to change this to be
per-function, with caller with a different calling convention having to
adjust for different ABI callee.  To some extent, recent GCC versions
do that automatically with -fipa-ra already - if some call used registers
are not clobbered by some call and the caller can analyze that callee,
it can stick values in such registers across the call.
I'd say the most natural API for this would be to allow
f{fixed,call-{used,saved}}-REG in target attribute.



One consequence of frequent changing calling convention per function or
register usage could be GCC slowdown.  RA calculates too many data and it
requires a lot of time to recalculate them after something in the register
usage convention is changed.

That is true.  i?86/x86_64 is a switchable target, so at least for the case
of info computed for the callee with non-standard calling convention such
info can be computed just once when the function with such a target
attribute would be seen first.
Yes, more clever way could be used.  We can can calculate the info for 
specific calling convention, save it and reuse it for the function with 
the same attributes.  The compilation speed will be ok even with the 
current implementation if there are few calling convention changes.

   But for the caller side, I agree not
everything can be precomputed, if we can't use e.g. regsets saved in the
callee; as a single function can call different functions with different
ABIs.  But to some extent we have that already with -fipa-ra, don't we?


Yes, for -fipa-ra if we saw the function, we know what registers it 
actually clobbers.  If we did not processed it yet, we use the worst 
case scenario (clobbering all clobbered registers according to calling 
convention).


Actually it raise a question for me.  If we describe that a function 
clobbers more than calling convention and then use it as a value 
(assigning a variable or passing as an argument) and loosing a track of 
it and than call it.  How can RA know what the call clobbers actually.  
So for the function with the attributes we should prohibit use it as a 
value or make the attributes as a part of the function type, or at least 
say it is unsafe.  So now I see this as a *bigger problem* with this 
extension.  Although I guess it already exists as we have description of 
different ABI as an extension.




Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Andy Lutomirski
On Wed, Jul 1, 2015 at 10:35 AM, Vladimir Makarov  wrote:
> Actually it raise a question for me.  If we describe that a function
> clobbers more than calling convention and then use it as a value (assigning
> a variable or passing as an argument) and loosing a track of it and than
> call it.  How can RA know what the call clobbers actually.  So for the
> function with the attributes we should prohibit use it as a value or make
> the attributes as a part of the function type, or at least say it is unsafe.

I think it should be part of the type.  This shouldn't compile:

void func(void) __attribute__((used_reg("r12")));
void (*x)(void);
x = func;

--Andy


Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Jakub Jelinek
On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
> Actually it raise a question for me.  If we describe that a function
> clobbers more than calling convention and then use it as a value (assigning
> a variable or passing as an argument) and loosing a track of it and than
> call it.  How can RA know what the call clobbers actually.  So for the
> function with the attributes we should prohibit use it as a value or make
> the attributes as a part of the function type, or at least say it is unsafe.
> So now I see this as a *bigger problem* with this extension.  Although I
> guess it already exists as we have description of different ABI as an
> extension.

Unfortunately target attribute is function decl attribute rather than
function type.  And having more attributes affect switchable targets will be
non-fun.

Jakub


Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Vladimir Makarov



On 07/01/2015 11:27 AM, Andy Lutomirski wrote:

On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov  wrote:


On 06/30/2015 05:37 PM, Jakub Jelinek wrote:

On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:

I'm working on a massive set of cleanups to Linux's syscall handling.
We currently have a nasty optimization in which we don't save rbx,
rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
This works, but it makes the code a huge mess.  I'd rather save all
regs in asm and then call C code.

Unfortunately, this will add five cycles (on SNB) to one of the
hottest paths in the kernel.  To counteract it, I have a gcc feature
request that might not be all that crazy.  When writing C functions
intended to be called from asm, what if we could do:

__attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
"r15"))) void func(void);

This will save enough pushes and pops that it could easily give us our
five cycles back and then some.  It's also easy to be compatible with
old GCC versions -- we could just omit the attribute, since preserving
a register is always safe.

Thoughts?  Is this totally crazy?  Is it easy to implement?

(I'm not necessarily suggesting that we do this for the syscall bodies
themselves.  I want to do it for the entry and exit helpers, so we'd
still lose the five cycles in the full fast-path case, but we'd do
better in the slower paths, and the slower paths are becoming
increasingly important in real workloads.)

GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
options, which allow to tweak the calling conventions; but it is per
translation unit right now.  It isn't clear which of these options
you mean with the extra_clobber.
I assume you are looking for a possibility to change this to be
per-function, with caller with a different calling convention having to
adjust for different ABI callee.  To some extent, recent GCC versions
do that automatically with -fipa-ra already - if some call used registers
are not clobbered by some call and the caller can analyze that callee,
it can stick values in such registers across the call.
I'd say the most natural API for this would be to allow
f{fixed,call-{used,saved}}-REG in target attribute.



One consequence of frequent changing calling convention per function or
register usage could be GCC slowdown.  RA calculates too many data and it
requires a lot of time to recalculate them after something in the register
usage convention is changed.

Do you mean that RA precalculates things based on the calling
convention and saves it across functions?
RA calculates a lot info (register classes, class x class relations etc) 
based on register usage convention (fixed regs, call used registers 
etc).  If register usage convention is not changed from previous 
function compilation, RA reuses the info.  Otherwise, RA recalculates it.

   Hmm.  I don't think this
would be a big problem in my intended use case -- there would only be
a handful of functions using this extension, and they'd have very few
non-asm callers.
Good.  I guess it will be rarely used and people will tolerate some 
extra compilation time.

Another consequence would be that RA fails generate the code in some cases
and even worse the failure might depend on version of GCC (I already saw PRs
where RA worked for an asm in one GCC version because a pseudo was changed
by equivalent constant and failed in another GCC version where it did not
happen).


Would this be a problem generating code for a function with extra
"used" regs or just a problem generating code to call such a function.
I imagine that, in the former case, RA's job would be easier, not
harder, since there would be more registers to work with.
Sorry, I meant that the problem will be mostly when the attributes 
describe more fixed regs.  If you describe more clobbered regs, they 
still can be used for allocator which can spill/restore them (around 
calls) when they can not be used. Still i think there will be some rare 
and complicated cases where even describing only clobbered regs can make 
RA fails in a function calling the function with additional clobbered regs.

   In
practice, though, I think it would just end up changing the prologue
and epilogue.





Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Vladimir Makarov



On 07/01/2015 01:43 PM, Jakub Jelinek wrote:

On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:

Actually it raise a question for me.  If we describe that a function
clobbers more than calling convention and then use it as a value (assigning
a variable or passing as an argument) and loosing a track of it and than
call it.  How can RA know what the call clobbers actually.  So for the
function with the attributes we should prohibit use it as a value or make
the attributes as a part of the function type, or at least say it is unsafe.
So now I see this as a *bigger problem* with this extension.  Although I
guess it already exists as we have description of different ABI as an
extension.

Unfortunately target attribute is function decl attribute rather than
function type.  And having more attributes affect switchable targets will be
non-fun.



Making attributes a part of type probably creates a lot issues too.

Although I am not a front-end developer, still I think it is hard to 
implement in front-end.  Sticking fully to this approach, it would be 
logical to describe this as a debug info (I am not sure it is even 
possible).


Portability would be an issue too.  It is hard to prevent for a regular 
C developer to assign such function to variable because it is ok on his 
system while the compilation of such code may fail on another system.




Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Andy Lutomirski
On Wed, Jul 1, 2015 at 10:43 AM, Jakub Jelinek  wrote:
> On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
>> Actually it raise a question for me.  If we describe that a function
>> clobbers more than calling convention and then use it as a value (assigning
>> a variable or passing as an argument) and loosing a track of it and than
>> call it.  How can RA know what the call clobbers actually.  So for the
>> function with the attributes we should prohibit use it as a value or make
>> the attributes as a part of the function type, or at least say it is unsafe.
>> So now I see this as a *bigger problem* with this extension.  Although I
>> guess it already exists as we have description of different ABI as an
>> extension.
>
> Unfortunately target attribute is function decl attribute rather than
> function type.  And having more attributes affect switchable targets will be
> non-fun.

Just to make sure we're on the same page here, if I write:

extern void normal_func(void);

void weird_func(void) __attribute__((used_regs("r12")))
{
  // do something
  normal_func();
  // do something
}

I'd want the code that calls normal_func() to be understand that
normal_func() *will* preserve r12 despite the fact that weird_func is
allowed to clobber r12.  I think this means that the attribute would
have to be an attribute of a function, not of the RA while compiling
the function.

--Andy


making the new if-converter not mangle IR that is already vectorizer-friendly

2015-07-01 Thread Abe

Dear all,

[Please feel free to skip to the second instance of "end of introductions"
 and read the introduction sections later or never.]




Hi!  My name is Abe.  Although I`m from New York City, I`ve been living in 
Texas for about 5 years now,
due to having been "sucked in" to Texas by Texas A&M University and staying in 
Texas for an excellent job
at the Samsung Austin R&D Center ["SARC"], where the compiler team of which I 
am a part is working on GCC.






As some of you already know, at SARC we are working on a new "if converter" to 
help convert
simple "if"-based blocks of code that appear inside loops into an 
autovectorizer-friendly form
that closely resembles the C ternary operator ["c ? x : y"].  GCC already has 
such a converter,
but it is off by default, in part because it is unsafe: if enabled, it can 
cause certain code
to be transformed in such a way that it malfunctions even though the 
non-converted code worked
just fine with the same inputs.  The new converter, originally by my teammate 
Sebastian Pop,
is safer [almost-always safe *]; we are working on getting it into good-enough 
shape that the
always-safe transformations can be turned on by default whenever the 
autovectorizer is on.

* Always safe for stores, sometimes a little risky for loads:
  speculative loads might cause multithreaded programs with
  insufficient locking to fail due to writes by another thread
  being "lost"/"missed", even though the same program works OK
  "by luck" when compiled without if-conversion of loads.
  This risk comes mainly/only from what the relevant literature
  calls a "half hammock": an "if" with a "then" section but no
  "else" section [or effectively vice-versa, e.g. an empty "then"
  and a non-empty "else"].  In this case, e.g. "if (c)  X[x] = Y[y];"
  with no attached "else" section is risky to fully if-convert
  in the event of the code being compiled running multithreaded
  and not having been written with all the locking it really needs.
  Respectively, e.g. "if (c)  ; /* empty ''then'' */  else  X[x] = Y[y];".




[end of introductions]




One of the reasons the new if converter has not yet been submitted
for incorporation into GCC`s trunk is that it still has some
performance regressions WRT the old converter, and most of those
are "true regressions", i.e. not just because the old converter
was less safe and the additional safety is what is causing the loss,
but rather because there is more work to do before the patch is ready.

As of this writing, the new if converter sometimes tries
to "convert" something that is already vectorizer-friendly,
and in doing so it renders that code now-NOT-vectorizer-friendly.
I think this is the first class of true regression that I should fix.
The question is how to do so.

My first choice was to try to have the if converter ask the vectorizer
"is this code already vectorizable?", but that seems to not be feasible.

The second choice is to use loop versioning to defer the decision to the 
vectorizer itself.
In other words, when the if converter sees what it "thinks" is an opportunity 
to convert
a loop, it will duplicate that loop inside a new "if" and convert exactly one 
of the
duplicates, producing something like this:

  if (__vectorizable_0001__) {
/* if-converted loop */
  } else {
/* original loop */
  }

Under this plan, the vectorizer will be modified to detect code such as the 
above,
and when appropriate will check to see if the if-converted code is vectorizable,
then replacing e.g. "__vectorizable_0001__" with either (0) or (1) as per the 
check,
thus allowing dead-code elimination to clean up the temporary mess.

Sebastian has suggested that we add something like "tree 
ifConversion_condition;"
to "struct loop" so that we can keep track of e.g. "__vectorizable_0001__"
and use it in the vectorizer.  When there is no relevant conversion,
this field will be set to null.  This seems like a good plan to me.

I/we seek your feedback on the above.




Regards,

Abe


gcc-4.9-20150701 is now available

2015-07-01 Thread gccadmin
Snapshot gcc-4.9-20150701 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20150701/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 225283

You'll find:

 gcc-4.9-20150701.tar.bz2 Complete GCC

  MD5=2c52d82836abd52743fce577121aa0a2
  SHA1=dfaa9f98bb6e067d209e98ac5f5dcbb924e95f31

Diffs from 4.9-20150624 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


RFC: Add R_X86_64_INDBR_GOTPCREL and R_386_INDBR_GOT32

2015-07-01 Thread H.J. Lu
To avoid indirect branch to locally defined functions, I am proposing to
add a new relocation, R_X86_64_INDBR_GOTPCREL, to x86-64 psABI: 

1. When branching to an external function, foo, toolchain generates 
call/jmp *foo@GOTPCREL(%rip) 
   with R_X86_64_INDBR_GOTPCREL relocation, instead of 
call/jmp foo[@PLT] 
2. When function foo is locally defined, linker converts 
call/jmp *foo@GOTPCREL(%rip) 
   to 
nop call/jmp foo 
3. Otherwise, linker treats R_X86_64_INDBR_GOTPCREL the same way as 
   R_X86_64_GOTPCREL. 

For i386 psABI, we add R_386_INDBR_GOT32: 

1. When branching to an external function, foo, in non-PIC mode, 
toolchain generates 
call/jmp *foo@GOT 
   with R_386_INDBR_GOT32 relocation, instead of 
call/jmp foo 
   and in PIC mode 
call/jmp *foo@GOT(%reg) 
   with R_386_INDBR_GOT32 relocation and REG holds the address 
   of GOT, instead of 
call/jmp foo@PLT 
2. When function foo is locally defined, linker converts 
call/jmp *foo@GOT[(%reg)] 
   to 
nop call/jmp foo 
3. Otherwise, 
   a. In PIC mode, linker treats R_386_INDBR_GOT32 the same way as 
  R_386_GOT32 and "call/jmp *foo@GOT" is unsupported. 
   b. In no-PIC mode, linker computes its relocation value as relocation 
  value of R_386_GOT32 plus the address of GOT and converts 
call/jmp *foo@GOT(%reg) 
  to 
call/jmp *foo@GOT 
  if needed. 

This new relocation effectively turns off lazy binding on function, foo. 

For i386, compiler is free to choose any register to hold the address of 
GOT and there is no need to make EBX a fixed register when branching to 
an external function in PIC mode. 

With this new relocation, only a one-byte NOP prefix overhead is added 
when function, foo, which compiler determines is external, turns out to 
be local at link-time, because of -Bsymbolic or a definition in another 
input object file which compiler has no knowledge of. 

The new -fno-plt GCC option can use R_X86_64_INDBR_GOTPCREL and 
R_386_INDBR_GOT32 relocations if linker supports them to avoid indirect 
branch to internal functions. 

For x86-64 GCC, it is implemented in assembler and linker.  Assembler
should generate R_X86_64_INDBR_GOTPCREL relocation, instead of 
R_X86_64_GOTPCREL relocation for “call/jmp *foo@GOTPCREL(%rip)” 

For i386 GCC, most is implemented in assembler and linker.  Assembler
should generate R_386_INDBR_GOT32 relocation, instead of R_386_GOT32
relocation, for “call/jmp *foo@GOT(%reg)”.  GCC also needs to modify
to generate “call/jmp *foo@GOT” in non-PIC mode. 


H.J.


rl78 vs cse vs memory_address_addr_space

2015-07-01 Thread DJ Delorie

In this bit of code in explow.c:

  /* By passing constant addresses through registers
 we get a chance to cse them.  */
  if (! cse_not_expected && CONSTANT_P (x) && CONSTANT_ADDRESS_P (x))
x = force_reg (address_mode, x);

On the rl78 it results in code that's a bit too complex for later
passes to be optimized fully.  Is there any way to indicate that the
above force_reg() is bad for a particular target?


Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread H. Peter Anvin
On 07/01/2015 10:43 AM, Jakub Jelinek wrote:
> On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
>> Actually it raise a question for me.  If we describe that a function
>> clobbers more than calling convention and then use it as a value (assigning
>> a variable or passing as an argument) and loosing a track of it and than
>> call it.  How can RA know what the call clobbers actually.  So for the
>> function with the attributes we should prohibit use it as a value or make
>> the attributes as a part of the function type, or at least say it is unsafe.
>> So now I see this as a *bigger problem* with this extension.  Although I
>> guess it already exists as we have description of different ABI as an
>> extension.
> 
> Unfortunately target attribute is function decl attribute rather than
> function type.  And having more attributes affect switchable targets will be
> non-fun.
> 

How on Earth does that work with existing switchable ABIs?  Keep in mind
that we already support multiple ABIs...

-hpa