Re: detailed comparison of generated code size for GCC and other compilers

2009-12-15 Thread Paolo Bonzini

On 12/14/2009 09:31 PM, John Regehr wrote:

Ok, thanks for the feedback Andi.  Incidentally, the LLVM folks seem to
agree with both of your suggestions. I'll re-run everything w/o frame
pointers and ignoring testcases where some compiler warns about use of
uninitialized local. I hate the way these warnings are not totally
reliable, but realistically if GCC catches most cases (which it almost
certainly will) the ones that slip past won't be too much of a problem.


I also wonder if you have something like LTO enabled.  This function 
produces completely bogus code in LLVM, presumably because some kind of 
LTO proves that CC1000SendReceiveP is never written.  Of course, this 
assumption would be wrong at runtime in a real program.


http://embed.cs.utah.edu/embarrassing/src_harvested_dec_09/015306.c

Of course the answer is not to disable LTO, but rather to add an 
"initializer" function that does


volatile void *p;
memcpy (CC1000SendReceiveP__f, p, sizeof (CC1000SendReceiveP__f));
memcpy (CC1000SendReceiveP__count, p, sizeof (CC1000SendReceiveP__count));
memcpy (CC1000SendReceiveP__rxBuf, p, sizeof (CC1000SendReceiveP__rxBuf));

... and to make all variables non-static (otherwise the initializer 
would have to be in the same file, but that would perturb your results).


I also agree with others that the frame pointer default is special 
enough to warrant adding a special -f option to compilers that generate 
it, if some other compilers do not generate it.


Paolo



Re: detailed comparison of generated code size for GCC and other compilers

2009-12-15 Thread Chris Lattner

On Dec 15, 2009, at 12:28 AM, Paolo Bonzini wrote:

> On 12/14/2009 09:31 PM, John Regehr wrote:
>> Ok, thanks for the feedback Andi.  Incidentally, the LLVM folks seem to
>> agree with both of your suggestions. I'll re-run everything w/o frame
>> pointers and ignoring testcases where some compiler warns about use of
>> uninitialized local. I hate the way these warnings are not totally
>> reliable, but realistically if GCC catches most cases (which it almost
>> certainly will) the ones that slip past won't be too much of a problem.
> 
> I also wonder if you have something like LTO enabled.

No, he doesn't enable LLVM LTO.  Even if it did, LTO wouldn't touch the 
'CC1000SendReceiveP*' definitions because they are not static (unless he 
explicitly built with an export map).

I haven't analyzed what is going on in this example though.  The code is 
probably using some undefined behavior and getting zapped.

-Chris

>  This function produces completely bogus code in LLVM, presumably because 
> some kind of LTO proves that CC1000SendReceiveP is never written.  Of course, 
> this assumption would be wrong at runtime in a real program.
> 
> http://embed.cs.utah.edu/embarrassing/src_harvested_dec_09/015306.c
> 
> Of course the answer is not to disable LTO, but rather to add an 
> "initializer" function that does
> 
> volatile void *p;
> memcpy (CC1000SendReceiveP__f, p, sizeof (CC1000SendReceiveP__f));
> memcpy (CC1000SendReceiveP__count, p, sizeof (CC1000SendReceiveP__count));
> memcpy (CC1000SendReceiveP__rxBuf, p, sizeof (CC1000SendReceiveP__rxBuf));
> 
> ... and to make all variables non-static (otherwise the initializer would 
> have to be in the same file, but that would perturb your results).
> 
> I also agree with others that the frame pointer default is special enough to 
> warrant adding a special -f option to compilers that generate it, if some 
> other compilers do not generate it.
> 
> Paolo
> 



an array's offset from the stack point

2009-12-15 Thread Jianzhang Peng
HI,
Can I get an array's offset from the stack point at pass-sched?

thanks
-- 
Jianzhang Peng


Re: detailed comparison of generated code size for GCC and other compilers

2009-12-15 Thread Paolo Bonzini



I also wonder if you have something like LTO enabled.


No, he doesn't enable LLVM LTO.  Even if it did, LTO wouldn't touch
the 'CC1000SendReceiveP*' definitions because they are not static
(unless he explicitly built with an export map).


Interesting.


I haven't analyzed what is going on in this example though.  The
code is probably using some undefined behavior and getting zapped.


This access is being eliminated:

 _cil_inline_tmp_23 =
   ((unsigned int)
*((uint8_t const *) ((void const *) ((unsigned char *) 0U)) +
 1) << 8) | (unsigned int) *((uint8_t const *) ((void const *)

GCC generates

movzbl  1, %eax
movzbl  0, %edx
sall$8, %eax
orl %edx, %eax
ret

so probably LLVM is eliminating NULL pointer accesses, or something like 
that.  This is the undefined behavior.


Thanks for following up.

Paolo


Re: Performance regression of generated numerical code

2009-12-15 Thread Martin Reinecke

Hi!


You didn't what target you are using. Pentium D can run both 32bit
and 64bit. codes.


This was done with 32bit code. I have opened PR 42376 describing
the issue and added some more information.

Cheers,
  Martin


Re: CFI statements vs. -pg

2009-12-15 Thread Richard Earnshaw

On Mon, 2009-12-14 at 19:18 +0100, Thomas Schwinge wrote:
> Hello!
> 
> I noticed the following on ARM, GCC trunk -- didn't check yet whether it
> is ARM-specific; may be a general issue.
> 
> Hacking out the forcing-off of emitting CFI statements in arm.c, I see
> the following function prologue emitted (-O -g):
> 
> .text
> .Ltext0:
> .cfi_sections   .debug_frame
> .align  2
> .global foo
> .type   foo, %function
> foo:
> .LFB0:
> .file 1 "c.c"
> .loc 1 2 0
> .cfi_startproc
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 0, uses_anonymous_args = 0
> stmfd   sp!, {r3, lr}
> .LCFI0:
> .cfi_def_cfa_offset 8
> .loc 1 4 0
> mov r0, #33
> .cfi_offset 14, -4
> .cfi_offset 3, -8
> bl  bar
> [...]
> 
> Comparing this to -pg:
> 
>  .LCFI0:
> .cfi_def_cfa_offset 8
> +   push{lr}
> +   bl  __gnu_mcount_nc
> .loc 1 4 0
> mov r0, #33
> 
> Shouldn't ``.cfi_adjust_cfa_offset 4'' or equivalent be emitted, too?  If
> I'm interpreting the .debug_frame correctly that is generated directly by
> GCC without using CFI statemnts, it seems to have the same problem.  Or
> am I misunderstanding something?

__gnu_mcount_nc is magic, it will pop that stack value before returning;
so while there's a slight inconsistency for those two instructions,
everything will be correct for the main body of the function.

I'm not sure what other architectures do in this case.  Do they also put
out adjustments to the cfi?

Any, this isn't the right place for this; could you raise a bug report
in bugzilla please?

R.





Re: detailed comparison of generated code size for GCC and other compilers

2009-12-15 Thread Andi Kleen
John Regehr  writes:

>> I would only be worried for cases where no warning is issued *and*
>> unitialized accesses are eliminated.
>
> Yeah, it would be excellent if GCC maintained the invariant that for
> all uses of uninitialized storage, either the compiler or else
> valgrind will issue a warning.

My understanding was that valgrind's detection of uninitialized
local variables is not 100% reliable because it cannot track
all updates of the frames (it's difficult to distingush stack
reuse from uninitialized stack)

e.g. 

int f1() { int x; return x; } 
int f2() { int x; return x; } 

int main(void)
{
f1();
f2();
return 0;
}

compiled without optimization so that the variables stay around
still gives no warning in valgrind:

==22573== Memcheck, a memory error detector
==22573== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==22573== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==22573== Command: ./a.out
==22573== 
==22573== 
==22573== HEAP SUMMARY:
==22573== in use at exit: 0 bytes in 0 blocks
==22573==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==22573== 
==22573== All heap blocks were freed -- no leaks are possible
==22573== 
==22573== For counts of detected and suppressed errors, rerun with: -v
==22573== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 5 from 5)

On the other hand the compiler tends to warn too much for
uninitialized variables, typically because it cannot handle something
like that:

void f(int flag)
{
int local;
if (flag)
... initialize local 
...

if (flag)
... use local 
}

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.


Re: detailed comparison of generated code size for GCC and other compilers

2009-12-15 Thread Mathieu Lacage
On Tue, 2009-12-15 at 11:24 +0100, Andi Kleen wrote:
> John Regehr  writes:
> 
> >> I would only be worried for cases where no warning is issued *and*
> >> unitialized accesses are eliminated.
> >
> > Yeah, it would be excellent if GCC maintained the invariant that for
> > all uses of uninitialized storage, either the compiler or else
> > valgrind will issue a warning.
> 
> My understanding was that valgrind's detection of uninitialized
> local variables is not 100% reliable because it cannot track
> all updates of the frames (it's difficult to distingush stack
> reuse from uninitialized stack)

I am not a valgrind expert so, take the following with a grain of salt
but I think that the above statement is wrong: valgrind reliably detects
use of uninitialized variables if you define 'use' as meaning 'affects
control flow of your program' in valgrind.

i.e., try this:

[mlac...@diese ~]$ cat > test.c
int f(void)
{
int x;
return x;
}
int main (int argc, char *argv[])
{
if (f())
{
printf ("something\n"); 
}
return 0;
}
^C
[mlac...@diese ~]$ gcc ./test.c
./test.c: In function ‘main’:
./test.c:10: warning: incompatible implicit declaration of built-in
function ‘printf’
[mlac...@diese ~]$ valgrind ./a.out 
==18933== Memcheck, a memory error detector.
==18933== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et
al.
==18933== Using LibVEX rev 1804, a library for dynamic binary
translation.
==18933== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==18933== Using valgrind-3.3.0, a dynamic binary instrumentation
framework.
==18933== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et
al.
==18933== For more details, rerun with: -v
==18933== 
==18933== Conditional jump or move depends on uninitialised value(s)
==18933==at 0x80483D7: main (in /home/mlacage/a.out)
something
==18933== 
==18933== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from
1)
==18933== malloc/free: in use at exit: 0 bytes in 0 blocks.
==18933== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==18933== For counts of detected errors, rerun with: -v
==18933== All heap blocks were freed -- no leaks are possible.
[mlac...@diese ~]$





Re: a question about argument ARG_POINTER_REGNUM

2009-12-15 Thread Ivan Shcherbakov
Hi, Ian,
 
ELIMINATE_REGS and TARGET_CAN_ELIMINATE are set correctly. As far as I
understand   from   further   investigation,   at  some  point  during
compilation,   the   argument  pointer  register  is  used,  then  the
expand_prologue()  produces  INSNs including "push argp" (as "argp" is
presently  defined  as  a general-purpose non-scratch register). Then,
during  reload  phase,  such  instruction  prevents  "argp" from being
eliminated,  as  it modifies SP and uses ARGP (ref_outside_mem) at the
same time:

  if (ep->previous_offset != ep->offset && ep->ref_outside_mem)
ep->can_eliminate = 0;


Is  there  a way to prevent prologue instructions (e.g. saving ARGP/FP
before   elimination   phase)   from   affecting  elimination?  Maybe,
expand_prologue()  should  behave differently depending on whether the
elimination has completed?

-- 
Best regards,
 Ivan Shcherbakov mailto:shcherba...@eit.uni-kl.de
 TU Kaiserslautern, Germany
 Department of Real-Time Systems



Re: detailed comparison of generated code size for GCC and other compilers

2009-12-15 Thread Andi Kleen
> I am not a valgrind expert so, take the following with a grain of salt
> but I think that the above statement is wrong: valgrind reliably detects
> use of uninitialized variables if you define 'use' as meaning 'affects
> control flow of your program' in valgrind.

It works in some cases for the stack, but not in all. Consider the redzone 
on the x86-64 ABI. How should valgrind distingush an uninitialized redzone
variable from a initialized one if the stack has been used before? I didn't 
even think it worked in all cases for variables in the real frame.

You're right my example was bogus because it didn't test the control flow.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [RFC] LTO and debug information

2009-12-15 Thread Diego Novillo
On Sun, Dec 13, 2009 at 15:51, Richard Guenther  wrote:

> + /* ???  We could free non-constant DECL_SIZE, DECL_SIZE_UNIT
> +    and DECL_FIELD_OFFSET.  But it's cheap enough to not do
> +    that and refrain from adding workarounds to dwarf2out.c  */
> +
> + /* DECL_FCONTEXT is only used for debug info generation.  */
> + if (TREE_CODE (decl) == FIELD_DECL
> +     && debug_info_level < DINFO_LEVEL_TERSE)
> +   DECL_FCONTEXT (decl) = NULL_TREE;
> +

Yes, keeping a reminder here will help future work in early debug
generation.  As for the rest of the patch, anything that gets us
closer to a working -g -flto is better than nothing.  Not clearing
these fields should not be a big deal for now.

The patch is OK with the reminders that Michael suggested.


Diego.


Re: New RTL instruction for my port

2009-12-15 Thread Jean Christophe Beyler
You are correct. So I should be changing things in the adjust_cost
function instead.

I was also wondering, these instructions modify an internal register
that has been set as a fixed register. However, the compiler optimizes
them out when the accumulator is not retrieved for a calculation. How
can I tell the compiler that it should not remove these instructions.

Here is an example code:

uint64_t foo (uint64_t x, uint64_t y)
{
uint64_t z;

__builtin_newins (x,y); /* Modifies the accumulator */

z = __builtin_retrieve_accum (); /* Retrieve the accumulator */

return z;
}

If I remove the instruction "z = ...;", then the compiler will
optimize out my first builtin call.


Thanks for your help and input,
Jc

On Mon, Dec 14, 2009 at 6:10 PM, Daniel Jacobowitz  wrote:
> On Mon, Dec 14, 2009 at 05:52:50PM -0500, Jean Christophe Beyler wrote:
>> I thought of that but then how do I add the cost ? I also have another
>> problem: there is a second instruction that would have the exact same
>> signature if I use an unspec.
>>
>> Is there a solution for that and how do I handle the cost then ?
>>    - Just say that an unspec has a higher cost?
>
> Are you really talking about rtx_costs?  It sounds to me more like you
> want to change your scheduler.
>
> --
> Daniel Jacobowitz
> CodeSourcery
>


Re: a question about argument ARG_POINTER_REGNUM

2009-12-15 Thread Ian Lance Taylor
Ivan Shcherbakov  writes:

> ELIMINATE_REGS and TARGET_CAN_ELIMINATE are set correctly. As far as I
> understand   from   further   investigation,   at  some  point  during
> compilation,   the   argument  pointer  register  is  used,  then  the
> expand_prologue()  produces  INSNs including "push argp" (as "argp" is
> presently  defined  as  a general-purpose non-scratch register).

I don't understand why that would happen.  I should you said that argp
was marked as a fixed register.  What code is generating that push
instruction?

Ian


Re: New RTL instruction for my port

2009-12-15 Thread Daniel Jacobowitz
On Tue, Dec 15, 2009 at 10:08:02AM -0500, Jean Christophe Beyler wrote:
> You are correct. So I should be changing things in the adjust_cost
> function instead.
> 
> I was also wondering, these instructions modify an internal register
> that has been set as a fixed register. However, the compiler optimizes
> them out when the accumulator is not retrieved for a calculation. How
> can I tell the compiler that it should not remove these instructions.
> 
> Here is an example code:
> 
> uint64_t foo (uint64_t x, uint64_t y)
> {
> uint64_t z;
> 
> __builtin_newins (x,y); /* Modifies the accumulator */
> 
> z = __builtin_retrieve_accum (); /* Retrieve the accumulator */
> 
> return z;
> }
> 
> If I remove the instruction "z = ...;", then the compiler will
> optimize out my first builtin call.

I suppose you could use EPILOGUE_USES to say that changes to the
accumulator should not be discarded.  You could also use
unspec_volatile instead of unspec, but that may further inhibit
optimization.

-- 
Daniel Jacobowitz
CodeSourcery


Re: a question about argument ARG_POINTER_REGNUM

2009-12-15 Thread Ivan Shcherbakov
Hi, Ian,
 
I  have  created a simpler example, just a function computing a sum of
its arguments:

int sum(int a, int b, int c, int d, int e, int f, int g, int h)
{
return a + b + c + d + e + f + g + h;
}

The "argp" is a pseudo-register included in all register classes, that
contain  normal  general-purpose  regs.  Here  is  the assembly output
(msp430-gcc -O0):

sum:
pushr4
mov r1, argp 
add #4, argp
sub #8, r1
mov r1, r4 
/* prologue ends here (frame size = 8) */
.L__FrameSize_sum=0x8
.L__FrameOffset_sum=0xa
mov r15, @r4 
mov r14, 2(r4) 
mov r13, 4(r4) 
mov r12, 6(r4) 
mov @r4, r15 
add 2(r4), r15
add 4(r4), r15
add 6(r4), r15
add @r16, r15
add 2(r16), r15
add 4(r16), r15
add 6(r16), r15

/* epilogue: frame size = 8 */
add #8, r1
pop r4
ret
.Lfe1:
.size   sum,.Lfe1-sum
;; End of function 

As   you  can  see, argp does not get eliminated. During reload phase,
all  pseudo-registers  have  reg_equiv_mem[i] and reg_equiv_address[i]
set to NULL, that prevents elimination from being executed.

Looks   that   some  transformation  before  elimination  has  already
substituted  "argp+offset"  values  into  INSNs,  preventing them from
being processed by elimination procedure. Here is a short example from
the  test.c.172r.ira  dump  file (that should correspond to pre-reload
pass):

(insn 12 11 13 2 2.c:3 (set (reg:HI 15 r15 [orig:24 D.1206 ] [24])
(plus:HI (reg:HI 15 r15 [orig:25 D.1205 ] [25])
(mem/c/i:HI (reg/f:HI 16 argp) [0 e+0 S2 A16]))) 73 {*addhi3_3} 
(nil))

Presently  I am trying to compare RTL dumps from different passes with
ones produced by i386-gcc. It seems that in x86 the argp register gets
eliminated  before  the  reload phase. However, on my msp430 port this
does not happen due to some reason.

Do  you  have  any  ideas,  what  other  code  can  be responsible for
eliminating argp?

-- 
Best regards,
 Ivan Shcherbakov mailto:shcherba...@eit.uni-kl.de
 TU Kaiserslautern, Germany
 Department of Real-Time Systems



Re: a question about argument ARG_POINTER_REGNUM

2009-12-15 Thread Ian Lance Taylor
Ivan Shcherbakov  writes:

> It seems that in x86 the argp register gets
> eliminated  before  the  reload phase.

That seems unlikely to me.  What pass do you think is eliminating the
argument register?

Ian


Re: New RTL instruction for my port

2009-12-15 Thread Jean Christophe Beyler
EPILOGUE_USES does not seem to work, the code still gets optimized out.

However, unspec_volatile works but then, as you have said, the
compiler doesn't optimize out things that it then could.

I have for example an instruction to set this special register.
Theoretically, if we had :

set (x);
set (y);

The compiler should remove the first set. Which it does if I remove
the volatile but keep a retrieval of the special register in the
function.

Any other ideas by any chance?

Thanks again,
Jc

On Tue, Dec 15, 2009 at 10:20 AM, Daniel Jacobowitz  wrote:
> On Tue, Dec 15, 2009 at 10:08:02AM -0500, Jean Christophe Beyler wrote:
>> You are correct. So I should be changing things in the adjust_cost
>> function instead.
>>
>> I was also wondering, these instructions modify an internal register
>> that has been set as a fixed register. However, the compiler optimizes
>> them out when the accumulator is not retrieved for a calculation. How
>> can I tell the compiler that it should not remove these instructions.
>>
>> Here is an example code:
>>
>> uint64_t foo (uint64_t x, uint64_t y)
>> {
>> uint64_t z;
>>
>> __builtin_newins (x,y); /* Modifies the accumulator */
>>
>> z = __builtin_retrieve_accum (); /* Retrieve the accumulator */
>>
>> return z;
>> }
>>
>> If I remove the instruction "z = ...;", then the compiler will
>> optimize out my first builtin call.
>
> I suppose you could use EPILOGUE_USES to say that changes to the
> accumulator should not be discarded.  You could also use
> unspec_volatile instead of unspec, but that may further inhibit
> optimization.
>
> --
> Daniel Jacobowitz
> CodeSourcery
>


Re: a question about argument ARG_POINTER_REGNUM

2009-12-15 Thread Ivan Shcherbakov
Hi, Ian,
 
For  i386-gcc,  this seems to happen during global register allocation
pass.  This  corresponds to IRA pass of gcc 4.4.x. I have attached the
corresponding RTL dump files.

-- 
Best regards,
 Ivan Shcherbakov mailto:shcherba...@eit.uni-kl.de
 TU Kaiserslautern, Germany
 Department of Real-Time Systems

2.c.175r.lreg
Description: Binary data


2.c.176r.greg
Description: Binary data


Re: a question about argument ARG_POINTER_REGNUM

2009-12-15 Thread Ian Lance Taylor
Ivan Shcherbakov  writes:

> For  i386-gcc,  this seems to happen during global register allocation
> pass.  This  corresponds to IRA pass of gcc 4.4.x. I have attached the
> corresponding RTL dump files.

That means that reload is where the register is eliminated, as
expected.  Reload is really part of register allocation, and it does
not have a separate dump file.

Ian


Re: [RFC] LTO and debug information

2009-12-15 Thread Richard Guenther
On Tue, 15 Dec 2009, Diego Novillo wrote:

> On Sun, Dec 13, 2009 at 15:51, Richard Guenther  wrote:
> 
> > + /* ???  We could free non-constant DECL_SIZE, DECL_SIZE_UNIT
> > +    and DECL_FIELD_OFFSET.  But it's cheap enough to not do
> > +    that and refrain from adding workarounds to dwarf2out.c  */
> > +
> > + /* DECL_FCONTEXT is only used for debug info generation.  */
> > + if (TREE_CODE (decl) == FIELD_DECL
> > +     && debug_info_level < DINFO_LEVEL_TERSE)
> > +   DECL_FCONTEXT (decl) = NULL_TREE;
> > +
> 
> Yes, keeping a reminder here will help future work in early debug
> generation.  As for the rest of the patch, anything that gets us
> closer to a working -g -flto is better than nothing.  Not clearing
> these fields should not be a big deal for now.
> 
> The patch is OK with the reminders that Michael suggested.

This is what I committed after re-bootstrapping and testing with
the above change.

Richard.

2009-12-12  Richard Guenther  

* tree.c (free_lang_data_in_binfo): Do not free BINFO_OFFSET
and BINFO_VPTR_FIELD.
(free_lang_data_in_type): Do not free TYPE_STUB_DECL if we
generate debug information.
(free_lang_data_in_decl): Do not free DECL_SIZE_UNIT,
DECL_SIZE, DECL_FIELD_OFFSET and DECL_FCONTEXT.
(free_lang_data): Do not disable debuginfo.
* lto-streamer-out.c (write_symbol_vec): Deal with
non-constant DECL_SIZE.
(pack_ts_base_value_fields): Write types with false
TREE_ASM_WRITTEN.
(lto_output_ts_type_tree_pointers): Stream TYPE_STUB_DECL.
* lto-streamer-in.c (lto_input_ts_type_tree_pointers): Stream
TYPE_STUB_DECL.
* dwarf2out.c (add_pure_or_virtual_attribute): Check for
DECL_CONTEXT.
(gen_type_die_for_member): Test for TYPE_STUB_DECL.
* opts.c (decode_options): Do not disable var-tracking for lto.
* doc/invoke.texi (-flto): Document -flto vs. -g experimental
status.
(-fwhopr): Document experimental status.

lto/
* lto.c (lto_fixup_field_decl): Fixup DECL_FIELD_OFFSET.
(lto_post_options): Do not disable debuginfo.

Index: gcc/tree.c
===
*** gcc/tree.c.orig 2009-12-12 01:14:07.0 +0100
--- gcc/tree.c  2009-12-13 21:48:20.0 +0100
*** free_lang_data_in_binfo (tree binfo)
*** 4152,4164 
  
gcc_assert (TREE_CODE (binfo) == TREE_BINFO);
  
-   BINFO_OFFSET (binfo) = NULL_TREE;
BINFO_VTABLE (binfo) = NULL_TREE;
-   BINFO_VPTR_FIELD (binfo) = NULL_TREE;
BINFO_BASE_ACCESSES (binfo) = NULL;
BINFO_INHERITANCE_CHAIN (binfo) = NULL_TREE;
BINFO_SUBVTT_INDEX (binfo) = NULL_TREE;
-   BINFO_VPTR_FIELD (binfo) = NULL_TREE;
  
for (i = 0; VEC_iterate (tree, BINFO_BASE_BINFOS (binfo), i, t); i++)
  free_lang_data_in_binfo (t);
--- 4152,4161 
*** free_lang_data_in_type (tree type)
*** 4253,4259 
  }
  
TYPE_CONTEXT (type) = NULL_TREE;
!   TYPE_STUB_DECL (type) = NULL_TREE;
  }
  
  
--- 4250,4257 
  }
  
TYPE_CONTEXT (type) = NULL_TREE;
!   if (debug_info_level < DINFO_LEVEL_TERSE)
! TYPE_STUB_DECL (type) = NULL_TREE;
  }
  
  
*** free_lang_data_in_decl (tree decl)
*** 4380,4408 
 }
 }
  
!   if (TREE_CODE (decl) == PARM_DECL
!   || TREE_CODE (decl) == FIELD_DECL
!   || TREE_CODE (decl) == RESULT_DECL)
! {
!   tree unit_size = DECL_SIZE_UNIT (decl);
!   tree size = DECL_SIZE (decl);
!   if ((unit_size && TREE_CODE (unit_size) != INTEGER_CST)
! || (size && TREE_CODE (size) != INTEGER_CST))
!   {
! DECL_SIZE_UNIT (decl) = NULL_TREE;
! DECL_SIZE (decl) = NULL_TREE;
!   }
  
!   if (TREE_CODE (decl) == FIELD_DECL
! && DECL_FIELD_OFFSET (decl)
! && TREE_CODE (DECL_FIELD_OFFSET (decl)) != INTEGER_CST)
!   DECL_FIELD_OFFSET (decl) = NULL_TREE;
! 
!   /* DECL_FCONTEXT is only used for debug info generation.  */
!   if (TREE_CODE (decl) == FIELD_DECL)
!   DECL_FCONTEXT (decl) = NULL_TREE;
! }
!   else if (TREE_CODE (decl) == FUNCTION_DECL)
  {
if (gimple_has_body_p (decl))
{
--- 4378,4393 
 }
 }
  
!  /* ???  We could free non-constant DECL_SIZE, DECL_SIZE_UNIT
! and DECL_FIELD_OFFSET.  But it's cheap enough to not do
! that and refrain from adding workarounds to dwarf2out.c  */
! 
!  /* DECL_FCONTEXT is only used for debug info generation.  */
!  if (TREE_CODE (decl) == FIELD_DECL
!  && debug_info_level < DINFO_LEVEL_TERSE)
!DECL_FCONTEXT (decl) = NULL_TREE;
  
!  if (TREE_CODE (decl) == FUNCTION_DECL)
  {
if (gimple_has_body_p (decl))
{
*** free_lang_data (void)
*** 4977,4989 
diagnostic_finalizer (global_dc) = default_diagnostic_finalizer;
diagnostic_format_decoder (global_dc) = default_tree_printer;
  
-   /* FIXME. We remove sufficient 

Re: CFI statements vs. -pg

2009-12-15 Thread Thomas Schwinge
Hello!

On 2009-12-15 10:15, Richard Earnshaw wrote:
> On Mon, 2009-12-14 at 19:18 +0100, Thomas Schwinge wrote:
>>  .LCFI0:
>> .cfi_def_cfa_offset 8
>> +   push{lr}
>> +   bl  __gnu_mcount_nc
>> .loc 1 4 0
>> mov r0, #33
>> 
>> Shouldn't ``.cfi_adjust_cfa_offset 4'' or equivalent be emitted, too?  If
>> I'm interpreting the .debug_frame correctly that is generated directly by
>> GCC without using CFI statemnts, it seems to have the same problem.  Or
>> am I misunderstanding something?
>
> __gnu_mcount_nc is magic, it will pop that stack value before returning;
> so while there's a slight inconsistency for those two instructions,
> everything will be correct for the main body of the function.

Yes, that's correct -- I was indeed concerned about the invalid CFA that
occurs between push {lr}; bl __gnu_mcount_nc and until __gnu_mcount_nc
does its final pop {r0-r3, ip, lr}.

To give some context: I'm currently adding CFI statements to all (well,
most) of the assembly code in glibc's ARM sysdep files to help GDB figure
out frame unwinding in the case that its heuristics fail; for example in
all cases where cancellable syscalls are involved in multi-threaded
processes.

And of course, if the CFA is invalid already when entering
__gnu_mcount_nc, it's difficult for me to do the CFI annotation correctly
in there.  (Or I begin __gnu_mcount_nc with .cfi_adjust_cfa_offset 4
which seems a bit ugly.)


> Any, this isn't the right place for this; could you raise a bug report
> in bugzilla please?

Done: 


Regards,
 Thomas


pgp030wwzWqXE.pgp
Description: PGP signature


Re: detailed comparison of generated code size for GCC and other compilers

2009-12-15 Thread John Regehr
Also, we're not running LTO in any compiler and we removed all "static" 
declarations from the code to keep compilers from making closed-world 
assumptions.


John Regehr


Re: generate RTL sequence

2009-12-15 Thread Jeff Law

On 12/10/09 18:33, daniel tian wrote:

Hi,
 I have a problem about RTL sequence.
 If I wanna generate the RTL in sequence, and don't let gcc to schedule 
them.
   
Then you need to generate them as a single insn which outputs multiple 
instructions.


Jeff



Re: detailed comparison of generated code size for GCC and other compilers

2009-12-15 Thread Andreas Schwab
John Regehr  writes:

>> I would only be worried for cases where no warning is issued *and*
>> unitialized accesses are eliminated.
>
> Yeah, it would be excellent if GCC maintained the invariant that for all
> uses of uninitialized storage, either the compiler or else valgrind will
> issue a warning.

If GCC cannot prove that an object is uninitialized it cannot optimize
based on that assumption, meaning that the access is likely to happen
unless it is dead for other reasons.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: a question about argument ARG_POINTER_REGNUM

2009-12-15 Thread Ivan Shcherbakov
Hi, Ian,
 
Thank   you   for  the information about register allocation sequence.
My problem was solved by adding AP-to-FP entry to ELIMINABLE_REGS.

I also encountered another minor problem. When GCC tries to generate a
"push  SP"  instruction  (e.g. some_func(&the_only_local_var);), it is
detected  during  reload,  and  "frame  pointer required" condition is
forcibly  set.  I  have  noticed that the i386 port does not have this
problem, as "push" only works for non-stack-related registers. I tried
to   implement   similar   solution  on  my  MSP430  target,  however,
restricting  push  instruction  and  adding  a  split statement causes
"unrecognizable  insn"  error  for "push {virtusl-stack-vars}". I used
the following MD code:

(define_insn "*pushhi"
  [(set (match_operand:HI 1 "push_operand" "=<")
(match_operand:HI 0 "general_no_elim_operand" "rim"))]
  ""
  "* return msp430_pushhi(insn, operands, NULL);"
  [(set_attr "length" "2")])

(define_split
  [(match_scratch:HI 2 "r")
   (set (match_operand:HI 1 "push_operand" "=<")
(match_operand:HI 0 "general_operand" "rim"))]
  ""
  [(set (match_dup 2) (match_dup 0))
   (set (match_dup 1) (match_dup 2))]
  "")

Do  you  know  any working way of telling GCC to use temporary scratch
register when the normal push INSN cannot be used?

-- 
Best regards,
 Ivan Shcherbakov mailto:shcherba...@eit.uni-kl.de
 TU Kaiserslautern, Germany
 Department of Real-Time Systems



gcc-4.4-20091215 is now available

2009-12-15 Thread gccadmin
Snapshot gcc-4.4-20091215 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20091215/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 155276

You'll find:

gcc-4.4-20091215.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.4-20091215.tar.bz2 C front end and core compiler

gcc-ada-4.4-20091215.tar.bz2  Ada front end and runtime

gcc-fortran-4.4-20091215.tar.bz2  Fortran front end and runtime

gcc-g++-4.4-20091215.tar.bz2  C++ front end and runtime

gcc-java-4.4-20091215.tar.bz2 Java front end and runtime

gcc-objc-4.4-20091215.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.4-20091215.tar.bz2The GCC testsuite

Diffs from 4.4-20091208 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


updated code size comparison

2009-12-15 Thread John Regehr

[cross-posting to the GCC and LLVM lists]

I've updated the code size results here:

  http://embed.cs.utah.edu/embarrassing/dec_09/

The changes for this run were:

- delete a number of testcases that contained use of uninitialized local
variables

- turn off frame pointer emission for all compilers

- ask all compilers to target x86 + SSE3

- ask all compilers to not emit stack protector code

- run unix2dos on the .c files so people on Windows don't see all the
lines running together

Hopefully the results are more fair and useful now.  Again, feedback is
appreciated.

Once people are happy with how these results are obtained, I'll plan on
just re-running the scripts every few months so we can see how the
compilers evolve.  Also there are many possibilities for enhancement
including adding new architectures, harvesting more and larger
functions, and harvesting C++ code.

Thanks,

John Regehr




Re: updated code size comparison

2009-12-15 Thread Miles Bader
John Regehr  writes:
> I've updated the code size results here:
>
>   http://embed.cs.utah.edu/embarrassing/dec_09/

The thing that bothers me about this is that you seem to put a lot of
emphasis on the test "X generated larger code than Y" without any
reflection of how much larger (it could be 1 byte, it could be 50
times).

Moreover, aggregating those boolean results to yield things like "X
generated larger code than Y NN% of the time" seems even weirder.

Is this really useful information, other than for marketing?

-Miles

-- 
Vote, v. The instrument and symbol of a freeman's power to make a fool of
himself and a wreck of his country.