RE: VREGS fails to handle subreg of mem

2014-04-02 Thread Claudiu Zissulescu
Thank you for the feedback. I did found the issue in mode_dependent_address_p 
hook.

//Claudiu

> -Original Message-
> From: Eric Botcazou [mailto:ebotca...@adacore.com]
> Sent: Monday, March 31, 2014 10:21 PM
> To: Claudiu Zissulescu
> Cc: gcc@gcc.gnu.org; Francois Bedard; claz...@gmail.com
> Subject: Re: VREGS fails to handle subreg of mem
> 
> > In our ARC port, we found the following situation after expand:
> >
> > (insn 23 22 24 5 (set (reg:SI 176)
> > (subreg:SI (mem/c:DI (plus:SI (reg/f:SI 147 virtual-stack-vars)
> > (const_int -268 [0xfef4])) [3
> > tmpoutst.st_size+0 S8 A32]) 4)) t02.c:64 -1 (nil))
> >
> > The virtual-stack-vars should be handled by GCC's VREGS step, in
> > instantiate_virtual_regs_in_insn(). However, this is not happening as
> > the subroutine is not designed to handle subregs of a mem. As a
> > consequence, virtual-stack-vars is not eliminated, and the compilation fails
> later on.
> > To solve this issue, I am proposing the attached patch on vregs, that
> > implements handling of such situation by
> > instantiate_virtual_regs_in_insn().
> >
> > Can you please let me know if this is an acceptable solution for the
> > given issue?
> 
> Very likely not, there should be no SUBREGs of MEMs after expand.
> 
> --
> Eric Botcazou


-fleading-underscore is not working as expected.

2014-04-02 Thread Umesh Kalappa
Dear All ,

Was enabled the switch  "-fleading-underscore"  to emit the global
symbol name with prefix _ .

The respective C source file

int a=10;

int b=10,c;

int test()

{

c =a+b ;

tes();

return c ;

}

and respective asm file

.global _a

.section.data

.align  1

.type   _a, %object

.size   _a, 2

_a:

.word   10

.global _b

.align  1

.type   _b, %object

.size   _b, 2

_b:

.word   10

.comm   _c, 2,2

.section.text

.align  1

.global _test

.type   _test, %function

_test:

ld  HL, (a)

ld  DE, (b)

add DE, HL

ld  (c), DE

cal _tes

ld  DE, (c)

ld  WA, DE

ret


if you see the asm ,the global symbol names was prefixed with _ in the
definition ,But not in the uses.

I'm sure we are missing something here w.r.t -fleading-underscore flag
and gcc source is 4.8.1.

Any help will be appreciated here .

Thank you
~Umesh


Help needed with zero/sign extension

2014-04-02 Thread Anthony Green

One embarrassing feature of the moxie compiler port is that it really
doesn't understand how to promote integral types.  Moxie cores
zero-extend all loads, but the compiler still shifts loaded values back
and forth to zero out the upper bits.

So...

unsigned int foo (unsigned char *c)
{
  return *c;
}

..results in...

foo:
ldi.l  $r1, 24
ld.b   $r0, ($r0)
ashl   $r0, $r1
lshr   $r0, $r1
ret

I though the answer was to simply add something like this...

(define_insn "zero_extendqisi"
  [(set (match_operand:SI 0 "register_operand" "=r")
(zero_extend:SI (match_operand:QI 1 "register_operand" "r")))]
  ""
  "; ZERO EXTEND (comment for debugging)")

But nothing changes in the example above.

However, the following code...

unsigned int p;
void foo (unsigned char *c)
{
  p = *c;
}

...does result in the correct output...

foo:
ld.b   $r0, ($r0)
; ZERO EXTEND (comment for debugging)
sta.l  p, $r0
ret


Any advice?  I'd really like to take care of this because the compiler
output is pretty bloated right now.

Here's what I've been testing with.  I'm not sure what I'm missing...

(define_insn "zero_extendqisi"
  [(set (match_operand:SI 0 "register_operand" "=r")
(zero_extend:SI (match_operand:QI 1 "register_operand" "r")))]
  ""
  "; ZERO EXTEND (comment for debugging)")

(define_expand "movqi"
  [(set (match_operand:QI 0 "general_operand" "")
(match_operand:QI 1 "general_operand" ""))]
  ""
  "
{
  /* If this is a store, force the value into a register.  */
  if (MEM_P (operands[0]))
operands[1] = force_reg (QImode, operands[1]);
}")

(define_insn "*movqi"
  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,W,A,r,r,B,r")
(match_operand:QI 1 "moxie_general_movsrc_operand" 
"O,r,i,r,r,W,A,r,B"))]
  "register_operand (operands[0], QImode)
   || register_operand (operands[1], QImode)"
  "@
   xor%0, %0
   mov%0, %1
   ldi.b  %0, %1
   st.b   %0, %1
   sta.b  %0, %1
   ld.b   %0, %1
   lda.b  %0, %1
   sto.b  %0, %1
   ldo.b  %0, %1"
  [(set_attr "length"   "2,2,6,2,6,2,6,6,6")])


Thanks!

AG


Re: Help needed with zero/sign extension

2014-04-02 Thread Joern Rennecke
On 2 April 2014 13:08, Anthony Green  wrote:

> I though the answer was to simply add something like this...
>
> (define_insn "zero_extendqisi"
>   [(set (match_operand:SI 0 "register_operand" "=r")
> (zero_extend:SI (match_operand:QI 1 "register_operand" "r")))]
>   ""
>   "; ZERO EXTEND (comment for debugging)")

That pattern is obviously not outputting valid code.
You should make this a define_insn_and_split, with an r/r alternative
that is split (after reload) as necesary into shifts, and an m/r alternative
that outputs a load.  sprinkle with rtx_cost adjustments as necessary.

> But nothing changes in the example above.

LOAD_EXTEND_OP can also avoid some unnecesary expansions.

ALthough we still have a long-standing issue of unnecessary extensions for
narrow integer types passed in/out of functions, and loaded from volatile
memory.


Re: WPA stream_out form & memory consumption

2014-04-02 Thread Martin Liška


On 03/27/2014 10:48 AM, Martin Liška wrote:

Previous patch is wrong, I did a mistake in name ;)

Martin

On 03/27/2014 09:52 AM, Martin Liška wrote:


On 03/25/2014 09:50 PM, Jan Hubicka wrote:

Hello,
I've been compiling Chromium with LTO and I noticed that WPA
stream_out forks and do parallel:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02621.html.

I am unable to fit in 16GB memory: ld uses about 8GB and lto1 about
6GB. When WPA start to fork, memory consumption increases so that
lto1 is killed. I would appreciate an --param option to disable this
WPA fork. The number of forks is taken from build system (-flto=9)
which is fine for ltrans phase, because LD releases aforementioned
8GB.

What do you think about that?

I can take a look - our measurements suggested that the WPA memory will
be later dominated by ltrans.  Perhaps Chromium does something that 
makes

WPA to explode that would be interesting to analyze.  I did not managed
to get through Chromium LTO build process recently (ninja builds are 
not

my friends), can you send me the instructions?

Honza

Thanks,
Martin


There are instructions how can one build chromium with LTO:
1) install depot-tools and export PATH variable according to guide: 
http://www.chromium.org/developers/how-tos/install-depot-tools

2) Checkout source code: gclient sync; cd src
3) Apply patch (enables system gold linker and disables LTO for a 
sandbox that uses top-level asm)

4) which ld should point to ld.gold
5) unsure that ld.bfd points to ld.bfd
6) run: build/gyp_chromium -Dwerror=
7) ninja -C out/Release chrome -jX

If there are any problems, follow: 
https://code.google.com/p/chromium/wiki/LinuxBuildInstructions


Martin





Hello,
  taking latest trunk gcc, I built Firefox and Chromium. Both projects 
compiled without debugging symbols and -O2 on an 8-core machine.


Firefox:
-flto=9, peak memory usage (in LTRANS): 11GB

Chromium:
-flto=6, peak memory usage (in parallel WPA phase ): 16.5GB

For details please see attached with graphs. The attachment contains 
also -fmem-report and -fmem-report-wpa.
I think reduced memory footprint to ~3.5GB is a bit optimistic: 
http://gcc.gnu.org/gcc-4.9/changes.html


Is there any way we can reduce the memory footprint?

Attachment (due to size restriction): 
https://drive.google.com/file/d/0B0pisUJ80pO1bnV5V0RtWXJkaVU/edit?usp=sharing


Thank you,
Martin


Access Error in classify_object_over_fdes on sparc-rtems

2014-04-02 Thread Joel Sherrill
Hi

I am sure this is a simple mistake in our linker script but what
magic incantation, symbols, sections, end marker, etc. are
assumed to be properly constructed before this method works?

A pointer to the right magic in the standard sparc-elf linker script
would likely be sufficient for me to fix this.

An explanation of what it is accomplishing would also be appreciated.

Thanks.

-- 
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available(256) 722-9985



HARD_REGNO_CALL_PART_CLOBBERED and regs_invalidated_by_call

2014-04-02 Thread Matthew Fortune
Hi Richard,

As part of implementing the new O32 FPXX ABI I am making use of the
HARD_REGNO_CALL_PART_CLOBBERED macro to allow odd-numbered floating-point 
registers to be considered as 'normally' callee-saved but call clobbered if 
they 
are being used to hold SImode or SFmode data. The macro is implemented as:

/* Odd numbered single precision registers are not considered call saved
   for O32 FPXX as they will be clobbered when run on an FR=1 FPU.  */
#define HARD_REGNO_CALL_PART_CLOBBERED(REGNO, MODE) \
  (TARGET_FLOATXX && ((MODE) == SFmode || (MODE) == SImode) \
   && FP_REG_P (REGNO) && (REGNO & 1))

IRA and LRA appear to work correctly w.r.t. HARD_REGNO_CALL_PART_CLOBBERED and 
I 
get the desired O32 FPXX ABI behaviour. However when writing a number of tests 
for this I triggered some optimisations (in particular regcprop) which ignored 
the fact that the odd-numbered single-precision registers are clobbered across 
calls and essentially undid the work IRA/LRA did in treating the register as 
clobbered. The reason for regcprop ignoring the call-clobbered nature of these 
registers is that it simply does not check. The test for call-clobbered 
registers solely relies on regs_invalidated_by_call which is (and cannot be) 
aware of the HARD_REGNO_CALL_PART_CLOBBERED macro as it has no information about
what mode registers are in when it is used. A proposed fix is inline below for
this specific issue.

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 101de76..cb2937c 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -1030,8 +1030,10 @@ copyprop_hardreg_forward_1 (basic_block bb, struct 
value_data *vd)
}
}

- EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call, 0, regno, 
hrsi)
-   if (regno < set_regno || regno >= set_regno + set_nregs)
+ for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+   if ((TEST_HARD_REG_BIT (regs_invalidated_by_call, regno)
+|| HARD_REGNO_CALL_PART_CLOBBERED (regno, vd->e[regno].mode))
+   && (regno < set_regno || regno >= set_regno + set_nregs))
  kill_value_regno (regno, 1, vd);

  /* If SET was seen in CALL_INSN_FUNCTION_USAGE, and SET_SRC

The problem is that there are several other passes that solely rely on 
regs_invalidated_by_call to determine call-clobbered status and will therefore 
make the mistake. Some of these passes simply don't have mode information 
around 
when handling call-clobbered registers which leaves me a little unsure of the 
best solution in those cases. I would expect that being over-cautious and 
always 
marking a potentially clobbered register as clobbered seems like one option but 
there is a risk that doing so could lead to legitimate use of a callee-saved 
register (in a mode that is not part clobbered) to be broken.  Essentially I 
would propose introducing another register set 'regs_maybe_invalidated_by_call' 
that includes all reg_invalidated_by_call and anything 
HARD_REGNO_CALL_PART_CLOBBERED reports true for when checking all registers 
against all modes. Wherever call-clobbered information is required but mode 
information is unavailable then regs_maybe_invalidated_by_call would then be 
used.  As I said though there are probably some corner cases to handle too.

I don't quite have the O32 FPXX patches ready to send out yet but this issue is 
relevant to all architectures using HARD_REGNO_CALL_PART_CLOBBERED, presumably 
nobody has hit it yet though.

Regards,
Matthew


Re: Help needed with zero/sign extension

2014-04-02 Thread Jeff Law

On 04/02/14 06:08, Anthony Green wrote:


One embarrassing feature of the moxie compiler port is that it really
doesn't understand how to promote integral types.  Moxie cores
zero-extend all loads, but the compiler still shifts loaded values back
and forth to zero out the upper bits.
I'm a bit surprised LOAD_EXTEND_OP doesn't cover this for you.  Maybe 
other aspects of the moxie are getting in the way:



(insn 7 6 8 2 (set (reg:SI 32)
(const_int 24 [0x18])) j.c:4 19 {*movsi}
 (nil))
(insn 8 7 10 2 (set (reg:SI 30 [ D.1371 ])
(ashift:SI (subreg:SI (mem:QI (reg:SI 2 $r0 [ c ]) [0 *c_2(D)+0 
S1 A8]) 0)

(reg:SI 32))) j.c:4 14 {ashlsi3}
 (expr_list:REG_DEAD (reg:SI 2 $r0 [ c ])
(nil)))
(note 10 8 15 2 NOTE_INSN_DELETED)
(insn 15 10 16 2 (set (reg/i:SI 2 $r0)
(lshiftrt:SI (reg:SI 30 [ D.1371 ])
(reg:SI 32))) j.c:5 16 {lshrsi3}
 (expr_list:REG_DEAD (reg:SI 32)
(expr_list:REG_DEAD (reg:SI 30 [ D.1371 ])
(nil


Looks problematical.  The shift count is used twice, so combine's going 
to have a bit of a tough time with this.


Perhaps allow constant shift counts then force them into registers after 
combine with splitters?


jeff




Re: Help needed with zero/sign extension

2014-04-02 Thread Anthony Green
Joern Rennecke  writes:

> On 2 April 2014 13:08, Anthony Green  wrote:
>
>> I though the answer was to simply add something like this...
>>
>> (define_insn "zero_extendqisi"
>>   [(set (match_operand:SI 0 "register_operand" "=r")
>> (zero_extend:SI (match_operand:QI 1 "register_operand" "r")))]
>>   ""
>>   "; ZERO EXTEND (comment for debugging)")
>
> That pattern is obviously not outputting valid code.

It's actually just a valid assembler comment so I can see if the pattern
is used.

> You should make this a define_insn_and_split, with an r/r alternative
> that is split (after reload) as necesary into shifts, and an m/r alternative
> that outputs a load.  sprinkle with rtx_cost adjustments as necessary.

Thanks for the tip.  I have it working now!

AG


Re: [patch] Fix texinfo warnings for doc/gcc.texi [was: Re: doc bugs]

2014-04-02 Thread Tobias Burnus

*PING*

Tobias Burnus wrote:

H.J. Lu wrote:
On Fri, Mar 28, 2014 at 12:41 PM, Mike Stump  
wrote:

Since we are nearing release, I thought I'd mention I see:
../../gcc/gcc/doc/invoke.texi:1114: warning: node next `Overall 
Options' in menu `C Dialect Options' and in sectioning `Invoking 
G++' differ

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59055


I think one reason that there are (and were) that many warnings is 
that only recently texinfo gained support for diagnosing these issues. 
(Or maybe not that recent but distributions were slow in adapting 
newer texinfo versions.)


Attached is a warning-removal patch.
OK for the trunk?

Regarding invoke.texi: It had (nearly) the same @menu twice, once 
under @chapter where it belongs to and once under a @section where it 
doesn't.


Tobias




Re: Help needed with zero/sign extension

2014-04-02 Thread Anthony Green
Jeff Law  writes:

> On 04/02/14 06:08, Anthony Green wrote:
>>
>> One embarrassing feature of the moxie compiler port is that it really
>> doesn't understand how to promote integral types.  Moxie cores
>> zero-extend all loads, but the compiler still shifts loaded values back
>> and forth to zero out the upper bits.
> I'm a bit surprised LOAD_EXTEND_OP doesn't cover this for you.  Maybe
> other aspects of the moxie are getting in the way:
>
>
> (insn 7 6 8 2 (set (reg:SI 32)
> (const_int 24 [0x18])) j.c:4 19 {*movsi}
>  (nil))
> (insn 8 7 10 2 (set (reg:SI 30 [ D.1371 ])
> (ashift:SI (subreg:SI (mem:QI (reg:SI 2 $r0 [ c ]) [0
> *c_2(D)+0 S1 A8]) 0)
> (reg:SI 32))) j.c:4 14 {ashlsi3}
>  (expr_list:REG_DEAD (reg:SI 2 $r0 [ c ])
> (nil)))
> (note 10 8 15 2 NOTE_INSN_DELETED)
> (insn 15 10 16 2 (set (reg/i:SI 2 $r0)
> (lshiftrt:SI (reg:SI 30 [ D.1371 ])
> (reg:SI 32))) j.c:5 16 {lshrsi3}
>  (expr_list:REG_DEAD (reg:SI 32)
> (expr_list:REG_DEAD (reg:SI 30 [ D.1371 ])
> (nil
>
>
> Looks problematical.  The shift count is used twice, so combine's
> going to have a bit of a tough time with this.
>
> Perhaps allow constant shift counts then force them into registers
> after combine with splitters?

Rather than use shifts, I've added "sign-extend byte" and "sign-extend
short" instructions (I have the luxury of a soft-core architecture with
a tiny user base :).  Switching char to unsigned by default was also a
good thing given zero-extend by default.

I've tested RTEMS apps on QEMU, so it's all good so far.  I'll submit
patches tonight.

AG



Re: -fleading-underscore is not working as expected.

2014-04-02 Thread Ian Lance Taylor
On Wed, Apr 2, 2014 at 2:15 AM, Umesh Kalappa  wrote:
>
> Was enabled the switch  "-fleading-underscore"  to emit the global
> symbol name with prefix _ .

> ld  HL, (a)
>
> ld  DE, (b)
>
> add DE, HL
>
> ld  (c), DE
>
> cal _tes
>
> ld  DE, (c)
>
> ld  WA, DE
>
> ret
>
>
> if you see the asm ,the global symbol names was prefixed with _ in the
> definition ,But not in the uses.
>
> I'm sure we are missing something here w.r.t -fleading-underscore flag
> and gcc source is 4.8.1.

You didn't mention which target you are using, and I don't recognize
it.  If this is a private target, your backend files are missing some
uses of %U (%U is a directive for asm_fprintf).

Ian


collect2 "-o" argument position problem

2014-04-02 Thread David Guillen
Hello guys,

I don't know whether this is the best place to ask for this, but
anyway, here we go:

I have two different commandlines for collect2 (I got them after using
-v in gcc) and I found out that the original one does not work because
of the position in the parameter list.


Error:
/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/collect2
--sysroot=/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot
--eh-frame-hdr -m elf_i386 -dynamic-linker -o conftest
/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crt1.o
/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crti.o
/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtbegin.o
-L/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc
-L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin
-L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/lib
-L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/lib
-L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib
conftest.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc
--as-needed -lgcc_s --no-as-needed
/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtend.o
/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crtn.o
/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin/ld:
cannot find conftest: No such file or directory

No error:
/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/collect2
--sysroot=/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot
--eh-frame-hdr -m elf_i386 -dynamic-linker
/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crt1.o
/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crti.o
/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtbegin.o
-L/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc
-L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin
-L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/lib
-L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/lib
-L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib
conftest.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc
--as-needed -lgcc_s --no-as-needed
/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtend.o
/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crtn.o
-o conftest


Any idea on why the parameter parsing fails? Is it a problem or is it
the expected behavior?

Thanks a lot,
David


Re: WPA stream_out form & memory consumption

2014-04-02 Thread Jan Hubicka
> 
> Hello,
>   taking latest trunk gcc, I built Firefox and Chromium. Both
> projects compiled without debugging symbols and -O2 on an 8-core
> machine.
> 
> Firefox:
> -flto=9, peak memory usage (in LTRANS): 11GB
> 
> Chromium:
> -flto=6, peak memory usage (in parallel WPA phase ): 16.5GB

I see, the ltrans memory use is however about the same later in the game.
> 
> For details please see attached with graphs. The attachment contains
> also -fmem-report and -fmem-report-wpa.
> I think reduced memory footprint to ~3.5GB is a bit optimistic:
> http://gcc.gnu.org/gcc-4.9/changes.html

I will need to re-measure my setup - it is what I got last time with basically
same configuration.  It depends on parallelism, you should get sub 4GB peak
with -flto=1, right? We should clarify this in changes.html.
> 
> Is there any way we can reduce the memory footprint?

Looking at the memreport we get for ggc memory:

Chromium:
cgraph.c:869 (cgraph_create_edge_1)   0: 0.0%  0: 
0.0%  274319552: 4.8%  0: 0.0%2637688
cgraph.c:510 (cgraph_allocate_node)   0: 0.0%  0: 
0.0%  426228128: 7.5%  0: 0.0%1299476
toplev.c:960 (realloc_for_line_map)   0: 0.0%  357908640: 
3.8% 1073743896:18.8%184: 0.0% 10
tree-streamer-in.c:621 (streamer_alloc_tree)  216054000:86.6% 
7623611824:80.2% 2536849136:44.5%   57818592:36.0%   69421368
Total 249562346   9504578411
   5700671942160593619 97146243
source location GarbageFreed
 Leak OverheadTimes

Firefox:
cgraph.c:869 (cgraph_create_edge_1)   0: 0.0%  0: 
0.0%  130358176: 6.9%  0: 0.0%1253444
cgraph.c:510 (cgraph_allocate_node)   0: 0.0%  0: 
0.0%  182236800: 9.7%  0: 0.0% 555600
toplev.c:960 (realloc_for_line_map)   0: 0.0%   89503888: 
5.5%  268468240:14.3%160: 0.0% 13
tree-streamer-in.c:621 (streamer_alloc_tree)   93089976:77.5%  
972848816:59.6%  639230248:33.9%   21332480:32.3%   13496198
Total 120076578   1632997043
   1883064062 65981723 24732501
source location GarbageFreed
 Leak OverheadTimes

So chromium uses quite a lot more trees and also seem to have about twice as 
many functions.
Next time, it is useful to include -Q while collecting the data - it shows 
individual GGC runs and also
memory usage accounted per pass.  That way we would know if there are a lot 
more functions to start with, or just
more inlining going on.

I have older patch that introduces cache to line map stremaing reducing its 
size quite a bit, that should save
some memory of realloc_for_line_map.
I will dig it out and update to current tree.

I also wonder where the rest of memory goes, since the graphs shows about 10GB 
for Firefox.
Some is probably accounting of mmap files, also gold's memory usage.
We collect only some of memory usage that is not in ggc. Vectors:

Chromium:
ipa-cp.c:2421 (grow_edge_clone_vectors)17225752: 6.9%   17225752
   1: 0.0%   
vec.h:1393 (copy)  17291228: 6.9%  100465316
 1499009: 3.7%   
lto-cgraph.c:141 (lto_symtab_encoder_encode)   30436272:12.2%   53192752
1460: 0.0%   
passes.c:2254 (execute_one_pass)   53853360:21.6%   83885960
 1426939: 3.5%   
ipa-inline-analysis.c:974 (inline_summary_alloc)   84406056:33.8%  137806000
  484472: 1.2% 
Total 249721648 
 40747241
Firefox:
ipa-cp.c:2421 (grow_edge_clone_vectors) 7753312: 6.1%7753312
   1: 0.0%
ipa-inline-analysis.c:4053 (read_inline_edge_sum8758216: 6.9%   26420804
  909584: 4.9%
ipa-ref.c:54 (ipa_record_reference)10747880: 8.4%   20943288
  371083: 2.0%
lto-cgraph.c:141 (lto_symtab_encoder_encode)   19756008:15.5%   23548272
1335: 0.0%
passes.c:2254 (execute_one_pass)   26769688:21.0%   41942904
  716378: 3.9%
ipa-inline-analysis.c:974 (inline_summary_alloc)   40110248:31.5%   62026480
  284283: 1.5%
Total 127480444 
 18430703

that seems as usual. 249MB seems acceptable.

Bitmaps seems to be dominated by ipa-reference.  On Chromium this pass seems to 
go crazy, having
about 80MB of bitmaps.  Perhaps you could try to get data with 
-fno-ipa-reference?

We ought to get stats on hashtables, since these probably consume quite some 
memory
during LTO streaing.
Could you perhaps also get -flto-report?

Honza

Re: collect2 "-o" argument position problem

2014-04-02 Thread Andrew Pinski
On Wed, Apr 2, 2014 at 3:26 PM, David Guillen  wrote:
> Hello guys,
>
> I don't know whether this is the best place to ask for this, but
> anyway, here we go:
>
> I have two different commandlines for collect2 (I got them after using
> -v in gcc) and I found out that the original one does not work because
> of the position in the parameter list.

The simple answer is -dynamic-linker takes an operand.
So in the first case, the operand to dynamic-linker is -o and in the
second case it is
/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crt1.o
.
Both seems wrong.  How is GCC being invoked here?  Do you have
-Wl,-dynamic-linker on the command line?

Thanks,
Andrew

>
>
> Error:
> /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/collect2
> --sysroot=/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot
> --eh-frame-hdr -m elf_i386 -dynamic-linker -o conftest
> /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crt1.o
> /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crti.o
> /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtbegin.o
> -L/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc
> -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin
> -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/lib
> -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/lib
> -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib
> conftest.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc
> --as-needed -lgcc_s --no-as-needed
> /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtend.o
> /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crtn.o
> /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin/ld:
> cannot find conftest: No such file or directory
>
> No error:
> /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/collect2
> --sysroot=/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot
> --eh-frame-hdr -m elf_i386 -dynamic-linker
> /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crt1.o
> /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crti.o
> /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtbegin.o
> -L/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc
> -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin
> -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/lib
> -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/lib
> -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib
> conftest.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc
> --as-needed -lgcc_s --no-as-needed
> /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtend.o
> /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crtn.o
> -o conftest
>
>
> Any idea on why the parameter parsing fails? Is it a problem or is it
> the expected behavior?
>
> Thanks a lot,
> David


Re: WPA stream_out form & memory consumption

2014-04-02 Thread Jan Hubicka
> Previous email presents a bit misleading graphs (influenced by
> --enable-gather-detailed-mem-stats).
> 
> Firefox:
> -flto=9, WPA peak: 8GB, LTRANS peak: 8GB
> -flto=4, WPA peak: 5GB, LTRANS peak: 3.5GB
> -flto=1, WPA peak: 3.5GB, LTRANS peak: ~1GB
> 
> These data shows that parallel WPA streaming increases short-time
> memory footprint by 4.5GB for -flto=9 (respectively by 1.5GB in case
> of -flto=4).
> 
> For more details, please see the attachment.

Aha, --enable-gather-detailed-mem-stats maintains on-side hashtable tracking all
ggc allocations so it almost doubles memory use. That explains the 
disproportions
in between GGC use and your graphs. Can you, perhaps, also get chromium graphs
without detailed stats?

Honza
> 
> Martin




Re: collect2 "-o" argument position problem

2014-04-02 Thread Jonathan Wakely
On 2 April 2014 23:26, David Guillen wrote:
> Hello guys,
>
> I don't know whether this is the best place to ask for this,

gcc-h...@gcc.gnu.org would have been better :-)