Re: libgcc: strange optimization

2011-08-02 Thread Mikael Pettersson
Hans-Peter Nilsson writes:
 > On Mon, 1 Aug 2011, Richard Henderson wrote:
 > 
 > > On 08/01/2011 01:30 PM, Michael Walle wrote:
 > > >  1) function inlining
 > > >  2) deferred argument evaluation
 > > >  3) because our target has no barrel shifter, (arg >> 10) is emitted as a
 > > > function call to libgcc's __ashrsi3 (_in place_!)
 > > >  4) BAM! dead code elimination optimizes r8 assignment away because calli
 > > > may clobber r1-r10 (callee saved registers on lm32).
 > >
 > > I'm afraid the only solution I can think of is to force F1 out-of-line.
 > 
 > Or another temporary - but the parameter should already have
 > that effect.

It should, but doesn't.  See PR48863 for similar breakage on ARM.

/Mikael


Re: A case that PRE optimization hurts performance

2011-08-02 Thread Richard Guenther
On Tue, Aug 2, 2011 at 4:37 AM, Jiangning Liu  wrote:
> Hi,
>
> For the following simple test case, PRE optimization hoists computation
> (s!=1) into the default branch of the switch statement, and finally causes
> very poor code generation. This problem occurs in both X86 and ARM, and I
> believe it is also a problem for other targets.
>
> int f(char *t) {
>    int s=0;
>
>    while (*t && s != 1) {
>        switch (s) {
>        case 0:
>            s = 2;
>            break;
>        case 2:
>            s = 1;
>            break;
>        default:
>            if (*t == '-')
>                s = 1;
>            break;
>        }
>        t++;
>    }
>
>    return s;
> }
>
> Taking X86 as an example, with option "-O2" you may find 52 instructions
> generated like below,
>
>  :
>   0:   55                      push   %ebp
>   1:   31 c0                   xor    %eax,%eax
>   3:   89 e5                   mov    %esp,%ebp
>   5:   57                      push   %edi
>   6:   56                      push   %esi
>   7:   53                      push   %ebx
>   8:   8b 55 08                mov    0x8(%ebp),%edx
>   b:   0f b6 0a                movzbl (%edx),%ecx
>   e:   84 c9                   test   %cl,%cl
>  10:   74 50                   je     62 
>  12:   83 c2 01                add    $0x1,%edx
>  15:   85 c0                   test   %eax,%eax
>  17:   75 23                   jne    3c 
>  19:   8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
>  20:   0f b6 0a                movzbl (%edx),%ecx
>  23:   84 c9                   test   %cl,%cl
>  25:   0f 95 c0                setne  %al
>  28:   89 c7                   mov    %eax,%edi
>  2a:   b8 02 00 00 00          mov    $0x2,%eax
>  2f:   89 fb                   mov    %edi,%ebx
>  31:   83 c2 01                add    $0x1,%edx
>  34:   84 db                   test   %bl,%bl
>  36:   74 2a                   je     62 
>  38:   85 c0                   test   %eax,%eax
>  3a:   74 e4                   je     20 
>  3c:   83 f8 02                cmp    $0x2,%eax
>  3f:   74 1f                   je     60 
>  41:   80 f9 2d                cmp    $0x2d,%cl
>  44:   74 22                   je     68 
>  46:   0f b6 0a                movzbl (%edx),%ecx
>  49:   83 f8 01                cmp    $0x1,%eax
>  4c:   0f 95 c3                setne  %bl
>  4f:   89 df                   mov    %ebx,%edi
>  51:   84 c9                   test   %cl,%cl
>  53:   0f 95 c3                setne  %bl
>  56:   89 de                   mov    %ebx,%esi
>  58:   21 f7                   and    %esi,%edi
>  5a:   eb d3                   jmp    2f 
>  5c:   8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
>  60:   b0 01                   mov    $0x1,%al
>  62:   5b                      pop    %ebx
>  63:   5e                      pop    %esi
>  64:   5f                      pop    %edi
>  65:   5d                      pop    %ebp
>  66:   c3                      ret
>  67:   90                      nop
>  68:   b8 01 00 00 00          mov    $0x1,%eax
>  6d:   5b                      pop    %ebx
>  6e:   5e                      pop    %esi
>  6f:   5f                      pop    %edi
>  70:   5d                      pop    %ebp
>  71:   c3                      ret
>
> But with command line option "-O2 -fno-tree-pre", there are only 12
> instructions generated, and the code would be very clean like below,
>
>  :
>   0:   55                      push   %ebp
>   1:   31 c0                   xor    %eax,%eax
>   3:   89 e5                   mov    %esp,%ebp
>   5:   8b 55 08                mov    0x8(%ebp),%edx
>   8:   80 3a 00                cmpb   $0x0,(%edx)
>   b:   74 0e                   je     1b 
>   d:   80 7a 01 00             cmpb   $0x0,0x1(%edx)
>  11:   b0 02                   mov    $0x2,%al
>  13:   ba 01 00 00 00          mov    $0x1,%edx
>  18:   0f 45 c2                cmovne %edx,%eax
>  1b:   5d                      pop    %ebp
>  1c:   c3                      ret
>
> Do you have any idea about this?

If you look at what PRE does it is clear that it's a profitable transformation
as it reduces the number of instructions computed on the paths where
s is set to a constant.  If you ask for size optimization you won't get
this transformation.

Now - on the tree level we don't see the if-conversion opportunity
(we don't have a conditional move instruction), but still we could
improve phiopt to handle switch statements - not sure what we
could do with the case in question though which is

:
  switch (s_3) , case 0: , case 2: >

:
  goto  ();

:
  if (D.2703_6 == 45)
goto ;
  else
goto  ();

:

  # s_2 = PHI <2(4), 1(3), 1(6), s_3(5)>
:

as the condition in at L3 complicates things.

The code is also really really special (and written in an awful way ...).

Richard.

> Thanks,
> -Jiangning
>
>
>
>


Re: Performance degradation on g++ 4.6

2011-08-02 Thread Richard Guenther
On Mon, Aug 1, 2011 at 8:43 PM, Oleg Smolsky
 wrote:
> On 2011/7/29 14:07, Xinliang David Li wrote:
>>
>> Profiling tools are your best friend here. If you don't have access to
>> any, the least you can do is to build the program with -pg option and
>> use gprof tool to find out differences.
>
> The test suite has a bunch of very basic C++ tests that are executed an
> enormous number of times. I've built one with the obvious performance
> degradation and attached the source, output and reports.
>
> Here are some highlights:
>    v4.1:    Total absolute time for int8_t constant folding: 30.42 sec
>    v4.6:    Total absolute time for int8_t constant folding: 43.32 sec
>
> Every one of the tests in this section had degraded... the first half more
> than the second. I am not sure how much further I can take this - the
> benchmarked code is very short and plain. I can post disassembly for one
> (some?) of them if anyone is willing to take a look...

I have a hard time actually seeing what expressions they try to fold
(argh, templates everywhere ...).  One thing that changed between
4.1 and 4.6 is that we can no longer re-associate freely signed integers
because of undefined overflow concerns - which is a correctness issue.
Depending on the way the tests are written the folding in 4.1 was
probably a bug.

Richard.

> Thanks,
> Oleg.
>


Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther
On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson  wrote:
> Hans-Peter Nilsson writes:
>  > On Mon, 1 Aug 2011, Richard Henderson wrote:
>  >
>  > > On 08/01/2011 01:30 PM, Michael Walle wrote:
>  > > >  1) function inlining
>  > > >  2) deferred argument evaluation
>  > > >  3) because our target has no barrel shifter, (arg >> 10) is emitted 
> as a
>  > > > function call to libgcc's __ashrsi3 (_in place_!)
>  > > >  4) BAM! dead code elimination optimizes r8 assignment away because 
> calli
>  > > > may clobber r1-r10 (callee saved registers on lm32).
>  > >
>  > > I'm afraid the only solution I can think of is to force F1 out-of-line.
>  >
>  > Or another temporary - but the parameter should already have
>  > that effect.
>
> It should, but doesn't.  See PR48863 for similar breakage on ARM.

On GIMPLE we don't see either the libcall nor those "dependencies".

Don't use register vars.

Richard.

> /Mikael
>


Re: libgcc: strange optimization

2011-08-02 Thread Georg-Johann Lay
Richard Guenther wrote:
> On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson  wrote:
>> Hans-Peter Nilsson writes:
>>  > On Mon, 1 Aug 2011, Richard Henderson wrote:
>>  >
>>  > > On 08/01/2011 01:30 PM, Michael Walle wrote:
>>  > > >  1) function inlining
>>  > > >  2) deferred argument evaluation
>>  > > >  3) because our target has no barrel shifter, (arg >> 10) is emitted 
>> as a
>>  > > > function call to libgcc's __ashrsi3 (_in place_!)
>>  > > >  4) BAM! dead code elimination optimizes r8 assignment away because 
>> calli
>>  > > > may clobber r1-r10 (callee saved registers on lm32).
>>  > >
>>  > > I'm afraid the only solution I can think of is to force F1 out-of-line.
>>  >
>>  > Or another temporary - but the parameter should already have
>>  > that effect.
>>
>> It should, but doesn't.  See PR48863 for similar breakage on ARM.
> 
> On GIMPLE we don't see either the libcall nor those "dependencies".
> 
> Don't use register vars.

IMO such code is supposed to work, e.g. in order to write an interface
to a non-ABI assembler function.  In general this cannot be expressed
by means of constraints because that would imply plethora of different
constraints for each register/mode.

>From the documentation a user will expect that he wrote correct code
and is not supposed to bother with GCC inerts like implicit library
calls or GIMPLE or whatever.

Johann

> Richard.




Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther
On Tue, Aug 2, 2011 at 12:01 PM, Georg-Johann Lay  wrote:
> Richard Guenther wrote:
>> On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson  wrote:
>>> Hans-Peter Nilsson writes:
>>>  > On Mon, 1 Aug 2011, Richard Henderson wrote:
>>>  >
>>>  > > On 08/01/2011 01:30 PM, Michael Walle wrote:
>>>  > > >  1) function inlining
>>>  > > >  2) deferred argument evaluation
>>>  > > >  3) because our target has no barrel shifter, (arg >> 10) is emitted 
>>> as a
>>>  > > > function call to libgcc's __ashrsi3 (_in place_!)
>>>  > > >  4) BAM! dead code elimination optimizes r8 assignment away because 
>>> calli
>>>  > > > may clobber r1-r10 (callee saved registers on lm32).
>>>  > >
>>>  > > I'm afraid the only solution I can think of is to force F1 out-of-line.
>>>  >
>>>  > Or another temporary - but the parameter should already have
>>>  > that effect.
>>>
>>> It should, but doesn't.  See PR48863 for similar breakage on ARM.
>>
>> On GIMPLE we don't see either the libcall nor those "dependencies".
>>
>> Don't use register vars.
>
> IMO such code is supposed to work, e.g. in order to write an interface
> to a non-ABI assembler function.  In general this cannot be expressed
> by means of constraints because that would imply plethora of different
> constraints for each register/mode.
>
> From the documentation a user will expect that he wrote correct code
> and is not supposed to bother with GCC inerts like implicit library
> calls or GIMPLE or whatever.

Well.  I suppose what is happening is that we expand from

  register int a1 __asm__ (*edi);
  register int a2 __asm__ (*eax);
  int D.2700;

:
  D.2700_2 = arg_1(D) >> 10;
  a1 = 10;
  a2 = D.2700_2;
  __asm__ __volatile__("scall" :  : "r" a1, "r" a2);
  return;

and end up TERing D.2700_2 = arg_1(D) >> 10, materializing a
libcall after the setting of a1 and before the asm.  To confirm that
try -fno-tree-ter.  I don't see how we can easily avoid this without
exposing libcalls at the gimple level.  Maybe disable TER if
we see any register variable use.

Richard.

> Johann
>
>> Richard.
>
>
>


Re: libgcc: strange optimization

2011-08-02 Thread Michael Walle

Hi,

> To confirm that try -fno-tree-ter.

"lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
assembly code:

f2:
addi sp, sp, -4
sw   (sp+4), ra
addi r2, r0, 10
calli__ashrsi3
addi r8, r0, 10
scall
lw   ra, (sp+4)
addi sp, sp, 4
bra

-- 
Michael



Re: libgcc: strange optimization

2011-08-02 Thread Mikael Pettersson
Michael Walle writes:
 > 
 > Hi,
 > 
 > > To confirm that try -fno-tree-ter.
 > 
 > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
 > assembly code:
 > 
 > f2:
 >  addi sp, sp, -4
 >  sw   (sp+4), ra
 >  addi r2, r0, 10
 >  calli__ashrsi3
 >  addi r8, r0, 10
 >  scall
 >  lw   ra, (sp+4)
 >  addi sp, sp, 4
 >  bra

-fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.


Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther
On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson  wrote:
> Michael Walle writes:
>  >
>  > Hi,
>  >
>  > > To confirm that try -fno-tree-ter.
>  >
>  > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
>  > assembly code:
>  >
>  > f2:
>  >      addi     sp, sp, -4
>  >      sw       (sp+4), ra
>  >      addi     r2, r0, 10
>  >      calli    __ashrsi3
>  >      addi     r8, r0, 10
>  >      scall
>  >      lw       ra, (sp+4)
>  >      addi     sp, sp, 4
>  >      b        ra
>
> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.

It's of course only a workaround, not a real fix as nothing prevents
other optimizers from performing the re-scheduling TER does.

I suggest to amend the documentation for local call-clobbered register
variables to say that the only valid sequence using them is from a
non-inlinable function that contains only direct initializations of the
register variables from constants or parameters.

Or go one step further and deprecate local register variables alltogether
(they IMHO don't make much sense, and rather the targets should provide
a way to properly constrain asm inputs and outputs).

Richard.


Re: libgcc: strange optimization

2011-08-02 Thread Georg-Johann Lay
Richard Guenther wrote:
> On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson  wrote:
>> Michael Walle writes:
>>  >
>>  > Hi,
>>  >
>>  > > To confirm that try -fno-tree-ter.
>>  >
>>  > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
>>  > assembly code:
>>  >
>>  > f2:
>>  >  addi sp, sp, -4
>>  >  sw   (sp+4), ra
>>  >  addi r2, r0, 10
>>  >  calli__ashrsi3
>>  >  addi r8, r0, 10
>>  >  scall
>>  >  lw   ra, (sp+4)
>>  >  addi sp, sp, 4
>>  >  bra
>>
>> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
> 
> It's of course only a workaround, not a real fix as nothing prevents
> other optimizers from performing the re-scheduling TER does.
> 
> I suggest to amend the documentation for local call-clobbered register
> variables to say that the only valid sequence using them is from a
> non-inlinable function that contains only direct initializations of the
> register variables from constants or parameters.
> 
> Or go one step further and deprecate local register variables alltogether
> (they IMHO don't make much sense, and rather the targets should provide
> a way to properly constrain asm inputs and outputs).

Strongly oppose.

Local register variables are very useful; maybe not on a linux machine
but on embedded systems there are situations you cannot do without.

You ever counted the constraint alternatives that that would be needed?

You'd need different constraints for QI/HI/SI/DI for each register
resulting in myriads of register classes increasing register allocation
time and dumps would become impossible to read.  I once tried it for
a target...never again.

Besides that, with local register vars the developer can write code that
meets his requirements, whereas for constraints you will always have to
change the compiler and make existing sources incompatible.

If GCC provides such a feature it must work properly and not be sacrifices
for this or that optimization.

Correct code is always preferred over non-functional code.

Johann

> 
> Richard.
> 



Re: libgcc: strange optimization

2011-08-02 Thread Hans-Peter Nilsson
On Tue, 2 Aug 2011, Richard Guenther wrote:
> On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson  wrote:
> > Michael Walle writes:
> >  >
> >  > Hi,
> >  >
> >  > > To confirm that try -fno-tree-ter.
> >  >
> >  > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
> >  > assembly code:
> >  >
> >  > f2:
> >  >      addi     sp, sp, -4
> >  >      sw       (sp+4), ra
> >  >      addi     r2, r0, 10
> >  >      calli    __ashrsi3
> >  >      addi     r8, r0, 10
> >  >      scall
> >  >      lw       ra, (sp+4)
> >  >      addi     sp, sp, 4
> >  >      b        ra
> >
> > -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
>
> It's of course only a workaround, not a real fix as nothing prevents
> other optimizers from performing the re-scheduling TER does.
>
> I suggest to amend the documentation for local call-clobbered register
> variables to say that the only valid sequence using them is from a
> non-inlinable function that contains only direct initializations of the
> register variables from constants or parameters.

I'd be ok with that, FWIW; I see the problem with keeping the
scheduling of operations in a working order (yuck) and I don't
see how else to keep it working ...except perhaps make gcc flag
functions with register asms as non-inlinable, maybe even flag
down any of the dangerous re-scheduling?

Maybe I can do that with some hand-holding?

> Or go one step further and deprecate local register variables alltogether
> (they IMHO don't make much sense, and rather the targets should provide
> a way to properly constrain asm inputs and outputs).

They do make sense when implementing e.g. system calls, and
they're documented to work as discussed.  (I almost regret
making that happen, though.)  Fortunately such functions are
small, and not relatively much helped by inlining (it's a
*syscall*; much more happening beyond the call than is affected
by inlining some parameter initialization).  Sure, new targets
are much better off by implementing that through other means,
but preferably intrinsic functions to asms.

brgds, H-P

Revision 176335 (removal of #include in thr-posix.h) cause numerous compile failures

2011-08-02 Thread Markus Trippelsdorf
Revisions 176335 removed the traditional "#include " from
gthr-posix.h. This breaks the build of many programs (Firefox, Chromium,
etc.) that implicitly rely on it. 
I'm not sure that the gain is worth the pain in this case.

-- 
Markus


Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther
On Tue, Aug 2, 2011 at 2:53 PM, Hans-Peter Nilsson  wrote:
> On Tue, 2 Aug 2011, Richard Guenther wrote:
>> On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson  wrote:
>> > Michael Walle writes:
>> >  >
>> >  > Hi,
>> >  >
>> >  > > To confirm that try -fno-tree-ter.
>> >  >
>> >  > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following 
>> > working
>> >  > assembly code:
>> >  >
>> >  > f2:
>> >  >      addi     sp, sp, -4
>> >  >      sw       (sp+4), ra
>> >  >      addi     r2, r0, 10
>> >  >      calli    __ashrsi3
>> >  >      addi     r8, r0, 10
>> >  >      scall
>> >  >      lw       ra, (sp+4)
>> >  >      addi     sp, sp, 4
>> >  >      b        ra
>> >
>> > -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
>>
>> It's of course only a workaround, not a real fix as nothing prevents
>> other optimizers from performing the re-scheduling TER does.
>>
>> I suggest to amend the documentation for local call-clobbered register
>> variables to say that the only valid sequence using them is from a
>> non-inlinable function that contains only direct initializations of the
>> register variables from constants or parameters.
>
> I'd be ok with that, FWIW; I see the problem with keeping the
> scheduling of operations in a working order (yuck) and I don't
> see how else to keep it working ...except perhaps make gcc flag
> functions with register asms as non-inlinable, maybe even flag
> down any of the dangerous re-scheduling?

But then can't people use a pure assembler stub instead?  Without
inlining there isn't much benefit left from writing

 void f1(int arg)
 {
  register int a1 asm("r8") = 10;
  register int a2 asm("r1") = arg;

  asm("scall" : : "r"(a1), "r"(a2));
 }

instead of

f1:
 mov r8, 10
 mov r1, rX
 scall
 ret

in a .s file no?  I doubt much prologue/epilogue is needed.

Or even write

void f1(int arg)
{
 asm("mov r8, %0; mov r1 %1; scall;" : : "g"(a1), "g"(a2) : "r8", "r1");
}

which should be inlinable again (yes, in inlined for not optimally
register-allocated, but compared to the non-inline routine?).

Richard.

> Maybe I can do that with some hand-holding?
>
>> Or go one step further and deprecate local register variables alltogether
>> (they IMHO don't make much sense, and rather the targets should provide
>> a way to properly constrain asm inputs and outputs).
>
> They do make sense when implementing e.g. system calls, and
> they're documented to work as discussed.  (I almost regret
> making that happen, though.)  Fortunately such functions are
> small, and not relatively much helped by inlining (it's a
> *syscall*; much more happening beyond the call than is affected
> by inlining some parameter initialization).  Sure, new targets
> are much better off by implementing that through other means,
> but preferably intrinsic functions to asms.
>
> brgds, H-P


Re: Revision 176335 (removal of #include in thr-posix.h) cause numerous compile failures

2011-08-02 Thread Jakub Jelinek
On Tue, Aug 02, 2011 at 03:08:03PM +0200, Markus Trippelsdorf wrote:
> Revisions 176335 removed the traditional "#include " from
> gthr-posix.h. This breaks the build of many programs (Firefox, Chromium,
> etc.) that implicitly rely on it. 

This isn't the first time the libstdc++ headers were cleaned up, and
each time there are dozens of programs that need to be fixed up.
Each time they just were fixed.

> I'm not sure that the gain is worth the pain in this case.

It certainly is.

Jakub


Re: libgcc: strange optimization

2011-08-02 Thread Hans-Peter Nilsson
On Tue, 2 Aug 2011, Richard Guenther wrote:
> > I'd be ok with that, FWIW; I see the problem with keeping the
> > scheduling of operations in a working order (yuck) and I don't
> > see how else to keep it working ...except perhaps make gcc flag
> > functions with register asms as non-inlinable, maybe even flag
> > down any of the dangerous re-scheduling?
>
> But then can't people use a pure assembler stub instead?

I see your point, but you're thinking new code, I'm thinking
let's keep existing code used by several targets and documented
as working the last seven years working.  Maybe breakable with a
major release; in gcc-5.  (Oh no, I see what's coming. :)

brgds, H-P


Re: libgcc: strange optimization

2011-08-02 Thread Ian Lance Taylor
Richard Guenther  writes:

> Or go one step further and deprecate local register variables alltogether
> (they IMHO don't make much sense, and rather the targets should provide
> a way to properly constrain asm inputs and outputs).

No, local register variables are documented as working and many programs
rely on them.  They are a straightforward way to get an asm argument in
a specific register, and I don't see any reason to break that.

> I suggest to amend the documentation for local call-clobbered register
> variables to say that the only valid sequence using them is from a
> non-inlinable function that contains only direct initializations of the
> register variables from constants or parameters.

Let's just implement those requirements in the compiler itself.

Ian


Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther
On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor  wrote:
> Richard Guenther  writes:
>
>> Or go one step further and deprecate local register variables alltogether
>> (they IMHO don't make much sense, and rather the targets should provide
>> a way to properly constrain asm inputs and outputs).
>
> No, local register variables are documented as working and many programs
> rely on them.  They are a straightforward way to get an asm argument in
> a specific register, and I don't see any reason to break that.

Well, maybe they look like so.  But in reality there is _no_ connection
from the register setup to the actual asm.  Which is the problem the
compiler faces here (apart from the libcall issue).  If there should be
an implicit dependence of all asms to all local register var setters
and users then this isn't implemented on gimple (or rather it works
by chance there as we treat register vars as memory and do not
disambiguate anything across asms (yet)).

>> I suggest to amend the documentation for local call-clobbered register
>> variables to say that the only valid sequence using them is from a
>> non-inlinable function that contains only direct initializations of the
>> register variables from constants or parameters.
>
> Let's just implement those requirements in the compiler itself.

Doesn't work for existing code, no?  And if thinking new code then
I'd rather have explicit dependences (and a way to represent them).
Thus, for example

asm ("scall" : : "asm("r0")" (10), ...)

thus, why force new constraints when we already can figure out
local register vars by register name?  Why not extend the constraint
syntax somehow to allow specifying the same effect?

Richard.

> Ian
>


Re: Revision 176335 (removal of #include in thr-posix.h) cause numerous compile failures

2011-08-02 Thread Jonathan Wakely
On 2 August 2011 14:08, Markus Trippelsdorf wrote:
> Revisions 176335 removed the traditional "#include " from
> gthr-posix.h. This breaks the build of many programs (Firefox, Chromium,
> etc.) that implicitly rely on it.
> I'm not sure that the gain is worth the pain in this case.

The "pain" fixes a real bug in libstdc++ which caused g++ to reject
valid programs.

Not rejecting valid programs is more important than accepting invalid
ones, especially ones that can be easily fixed by adding a missing
header.


Re: libgcc: strange optimization

2011-08-02 Thread Ian Lance Taylor
Richard Guenther  writes:

> On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor  wrote:
>> Richard Guenther  writes:
>>
>>> Or go one step further and deprecate local register variables alltogether
>>> (they IMHO don't make much sense, and rather the targets should provide
>>> a way to properly constrain asm inputs and outputs).
>>
>> No, local register variables are documented as working and many programs
>> rely on them.  They are a straightforward way to get an asm argument in
>> a specific register, and I don't see any reason to break that.
>
> Well, maybe they look like so.  But in reality there is _no_ connection
> from the register setup to the actual asm.  Which is the problem the
> compiler faces here (apart from the libcall issue).  If there should be
> an implicit dependence of all asms to all local register var setters
> and users then this isn't implemented on gimple (or rather it works
> by chance there as we treat register vars as memory and do not
> disambiguate anything across asms (yet)).

I'm not sure why we need to do anything at the GIMPLE level other than
disable some optimizations.  There is a connection from the register
variable to the asm--the asm refers to the variable.  There is nothing
specific about the register in there, but at the GIMPLE level there
doesn't have to be.

We should not break a useful existing feature because we find it
inconvenient.  Let's just disable some optimizations so that it
continues to work.


>>> I suggest to amend the documentation for local call-clobbered register
>>> variables to say that the only valid sequence using them is from a
>>> non-inlinable function that contains only direct initializations of the
>>> register variables from constants or parameters.
>>
>> Let's just implement those requirements in the compiler itself.
>
> Doesn't work for existing code, no?

Why not?


> And if thinking new code then
> I'd rather have explicit dependences (and a way to represent them).
> Thus, for example
>
> asm ("scall" : : "asm("r0")" (10), ...)
>
> thus, why force new constraints when we already can figure out
> local register vars by register name?  Why not extend the constraint
> syntax somehow to allow specifying the same effect?

I agree that it would be a good idea to permit asms to indicate the
specific register the operand should go into.

Ian


Re: libgcc: strange optimization

2011-08-02 Thread Richard Henderson
On 08/02/2011 05:22 AM, Richard Guenther wrote:
>> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
> 
> It's of course only a workaround, not a real fix as nothing prevents
> other optimizers from performing the re-scheduling TER does.
> 
> I suggest to amend the documentation for local call-clobbered register
> variables to say that the only valid sequence using them is from a
> non-inlinable function that contains only direct initializations of the
> register variables from constants or parameters.
> 
> Or go one step further and deprecate local register variables alltogether
> (they IMHO don't make much sense, and rather the targets should provide
> a way to properly constrain asm inputs and outputs).

Neither of these is a viable option.

What we might be able to do is throttle TER when the destination
is a local register variable.  This should unbreak the common case
of local regs immediately surrounding an asm.


r~


Re: libgcc: strange optimization

2011-08-02 Thread Georg-Johann Lay

Richard Guenther schrieb:

I suggest to amend the documentation for local call-clobbered register
variables to say that the only valid sequence using them is from a
non-inlinable function that contains only direct initializations of the
register variables from constants or parameters.

Richard.


That's completely counterproductive.

If a developer invents asm or local register variables he has a
very good reason for that choice like to meet hard (with hard as
in HARD) real time constraints.  Disabling inlining a function
that uses local register vars would make many places of local
register vars unusable because thre would no more be a way to
write down the exact register usage footprint of a piece of
asm.  Typical use-cases are interfacing to a function that has a
smaller register footprint than an ordinary block-box function or
doing some arithmetic that needs special hard regs for which there
are no fitting constraints. All this will be impossible if inlining
is disabled for such functions; then it is no more possible to
describe such a low-overhead piece of code without calling a black box 
function, clobber all call-clobbered registers, render a maybe 
tail-function into a non tail-call function etc.


Embedded systems like ARM or PowerPC based get more and more
important over the years and GCC should put more attention to
them and their needs; not only to bolide servers/PCs.
This includes systems with hard real time constraints.

Johann


Problem in bootstrapping trunk - error in stage 2 -mnolzcnt command line option.

2011-08-02 Thread Toon Moene

What am I doing wrong:

configure:3247: checking for suffix of object files
configure:3269: /home/toon/compilers/obj-t/./gcc/xgcc 
-B/home/toon/compilers/obj-t/./gcc/ 
-B/tmp/lto/x86_64-unknown-linux-gnu/bin/ 
-B/tmp/lto/x86_64-unknown-linux-gnu/lib/ -isystem 
/tmp/lto/x86_64-unknown-linux-gnu/include -isystem 
/tmp/lto/x86_64-unknown-linux-gnu/sys-include-c -g -O2  conftest.c >&5

cc1: error: unrecognized command line option '-mnolzcnt'

The configure line is:

../gcc/configure --prefix=/tmp/lto --enable-languages=c++ 
--with-build-config=bootstrap-lto --with-gnu-ld --disable-multilib 
--disable-nls --with-arch=native --with-tune=native && \


Kind regards,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: Problem in bootstrapping trunk - error in stage 2 -mnolzcnt command line option.

2011-08-02 Thread H.J. Lu
On Tue, Aug 2, 2011 at 11:24 AM, Toon Moene  wrote:
> What am I doing wrong:
>
> configure:3247: checking for suffix of object files
> configure:3269: /home/toon/compilers/obj-t/./gcc/xgcc
> -B/home/toon/compilers/obj-t/./gcc/ -B/tmp/lto/x86_64-unknown-linux-gnu/bin/
> -B/tmp/lto/x86_64-unknown-linux-gnu/lib/ -isystem
> /tmp/lto/x86_64-unknown-linux-gnu/include -isystem
> /tmp/lto/x86_64-unknown-linux-gnu/sys-include    -c -g -O2  conftest.c >&5
> cc1: error: unrecognized command line option '-mnolzcnt'
>
> The configure line is:
>
> ../gcc/configure --prefix=/tmp/lto --enable-languages=c++
> --with-build-config=bootstrap-lto --with-gnu-ld --disable-multilib
> --disable-nls --with-arch=native --with-tune=native && \
>

I checked in this as an obvious fix.

Thanks.


-- 
H.J.
---Index: ChangeLog
===
--- ChangeLog   (revision 177203)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2011-08-02  H.J. Lu  
+
+   * config/i386/driver-i386.c (host_detect_local_cpu): Fix a typo.
+
 2011-08-02  Richard Henderson  

PR target/49878
Index: config/i386/driver-i386.c
===
--- config/i386/driver-i386.c   (revision 177203)
+++ config/i386/driver-i386.c   (working copy)
@@ -718,7 +718,7 @@ const char *host_detect_local_cpu (int a
   const char *avx = has_avx ? " -mavx" : " -mno-avx";
   const char *sse4_2 = has_sse4_2 ? " -msse4.2" : " -mno-sse4.2";
   const char *sse4_1 = has_sse4_1 ? " -msse4.1" : " -mno-sse4.1";
-  const char *lzcnt = has_lzcnt ? " -mlzcnt" : " -mnolzcnt";
+  const char *lzcnt = has_lzcnt ? " -mlzcnt" : " -mno-lzcnt";

   options = concat (options, cx16, sahf, movbe, ase, pclmul,
popcnt, abm, lwp, fma, fma4, xop, bmi, tbm,


Re: libgcc: strange optimization

2011-08-02 Thread Richard Guenther
On Tue, Aug 2, 2011 at 6:02 PM, Richard Henderson  wrote:
> On 08/02/2011 05:22 AM, Richard Guenther wrote:
>>> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
>>
>> It's of course only a workaround, not a real fix as nothing prevents
>> other optimizers from performing the re-scheduling TER does.
>>
>> I suggest to amend the documentation for local call-clobbered register
>> variables to say that the only valid sequence using them is from a
>> non-inlinable function that contains only direct initializations of the
>> register variables from constants or parameters.
>>
>> Or go one step further and deprecate local register variables alltogether
>> (they IMHO don't make much sense, and rather the targets should provide
>> a way to properly constrain asm inputs and outputs).
>
> Neither of these is a viable option.
>
> What we might be able to do is throttle TER when the destination
> is a local register variable.  This should unbreak the common case
> of local regs immediately surrounding an asm.

Sure, similar to disabling TER for functions containing such vars.
But it isn't a solution for the general issue that nothing prevents
scheduling gimple statements between register variable def and use.

Richard.

>
> r~
>


ANN: gcc-python-plugin 0.6

2011-08-02 Thread David Malcolm
gcc-python-plugin is a plugin for GCC 4.6 onwards which embeds the
CPython interpreter within GCC, allowing you to write new compiler
warnings in Python, generate code visualizations, etc.

Tarball releases are available at:
  https://fedorahosted.org/releases/g/c/gcc-python-plugin/

Prebuilt-documentation can be seen at:
  http://readthedocs.org/docs/gcc-python-plugin/en/latest/index.html

Project homepage:
  https://fedorahosted.org/gcc-python-plugin/

What's new in v0.6:

  - It's now possible to create new gcc passes in Python, rather than
just wire up callbacks to existing passes.  See:
http://readthedocs.org/docs/gcc-python-plugin/en/latest/passes.html#creating-new-optimization-passes

  - GCC's callgraph information is now visible via a Python API.  See:
http://readthedocs.org/docs/gcc-python-plugin/en/latest/callgraph.html

  - GCC's Register Transfer Language is now visible via a Python API
(albeit a rather primitive one so far).

  - Improvements to Python 3 support (the selftests are now built and
run against the version of Python that the plugin was compiled against,
rather than hardcoding Python 2.7; various Python 3 and --with-py-debug
compatibility issues that this shook out are now fixed).

  - Various other minor fixes


Enjoy!
Dave




gcc-4.4-20110802 is now available

2011-08-02 Thread gccadmin
Snapshot gcc-4.4-20110802 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20110802/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 177219

You'll find:

 gcc-4.4-20110802.tar.bz2 Complete GCC

  MD5=5f0ee4e2d2e6d516442e1bf200694d38
  SHA1=d67e299b15edb6f7f90e95bdd172c83c269aa3bb

Diffs from 4.4-20110726 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: libgcc: strange optimization

2011-08-02 Thread Miles Bader
Richard Guenther  writes:
> But then can't people use a pure assembler stub instead?  Without
> inlining there isn't much benefit left from writing
>
>  void f1(int arg)
>  {
>   register int a1 asm("r8") = 10;
>   register int a2 asm("r1") = arg;
>
>   asm("scall" : : "r"(a1), "r"(a2));
>  }
>
> instead of
>
> f1:
>  mov r8, 10
>  mov r1, rX
>  scall
>  ret
>
> in a .s file no?  I doubt much prologue/epilogue is needed.
>
> Or even write
>
> void f1(int arg)
> {
>  asm("mov r8, %0; mov r1 %1; scall;" : : "g"(a1), "g"(a2) : "r8", "r1");
> }

Of course in practice people _do_ want to use it with f1 inlined, where
using reg variables (or alternatively, some expanded constraint language
for the asm parameters) can really get rid of tons of unnecessary asm
moves, and they want the compiler to guard against conflicts.

-Miles

-- 
"Whatever you do will be insignificant, but it is very important that
 you do it."  Mahatma Gandhi