RE: VREGS fails to handle subreg of mem
Thank you for the feedback. I did found the issue in mode_dependent_address_p hook. //Claudiu > -Original Message- > From: Eric Botcazou [mailto:ebotca...@adacore.com] > Sent: Monday, March 31, 2014 10:21 PM > To: Claudiu Zissulescu > Cc: gcc@gcc.gnu.org; Francois Bedard; claz...@gmail.com > Subject: Re: VREGS fails to handle subreg of mem > > > In our ARC port, we found the following situation after expand: > > > > (insn 23 22 24 5 (set (reg:SI 176) > > (subreg:SI (mem/c:DI (plus:SI (reg/f:SI 147 virtual-stack-vars) > > (const_int -268 [0xfef4])) [3 > > tmpoutst.st_size+0 S8 A32]) 4)) t02.c:64 -1 (nil)) > > > > The virtual-stack-vars should be handled by GCC's VREGS step, in > > instantiate_virtual_regs_in_insn(). However, this is not happening as > > the subroutine is not designed to handle subregs of a mem. As a > > consequence, virtual-stack-vars is not eliminated, and the compilation fails > later on. > > To solve this issue, I am proposing the attached patch on vregs, that > > implements handling of such situation by > > instantiate_virtual_regs_in_insn(). > > > > Can you please let me know if this is an acceptable solution for the > > given issue? > > Very likely not, there should be no SUBREGs of MEMs after expand. > > -- > Eric Botcazou
-fleading-underscore is not working as expected.
Dear All , Was enabled the switch "-fleading-underscore" to emit the global symbol name with prefix _ . The respective C source file int a=10; int b=10,c; int test() { c =a+b ; tes(); return c ; } and respective asm file .global _a .section.data .align 1 .type _a, %object .size _a, 2 _a: .word 10 .global _b .align 1 .type _b, %object .size _b, 2 _b: .word 10 .comm _c, 2,2 .section.text .align 1 .global _test .type _test, %function _test: ld HL, (a) ld DE, (b) add DE, HL ld (c), DE cal _tes ld DE, (c) ld WA, DE ret if you see the asm ,the global symbol names was prefixed with _ in the definition ,But not in the uses. I'm sure we are missing something here w.r.t -fleading-underscore flag and gcc source is 4.8.1. Any help will be appreciated here . Thank you ~Umesh
Help needed with zero/sign extension
One embarrassing feature of the moxie compiler port is that it really doesn't understand how to promote integral types. Moxie cores zero-extend all loads, but the compiler still shifts loaded values back and forth to zero out the upper bits. So... unsigned int foo (unsigned char *c) { return *c; } ..results in... foo: ldi.l $r1, 24 ld.b $r0, ($r0) ashl $r0, $r1 lshr $r0, $r1 ret I though the answer was to simply add something like this... (define_insn "zero_extendqisi" [(set (match_operand:SI 0 "register_operand" "=r") (zero_extend:SI (match_operand:QI 1 "register_operand" "r")))] "" "; ZERO EXTEND (comment for debugging)") But nothing changes in the example above. However, the following code... unsigned int p; void foo (unsigned char *c) { p = *c; } ...does result in the correct output... foo: ld.b $r0, ($r0) ; ZERO EXTEND (comment for debugging) sta.l p, $r0 ret Any advice? I'd really like to take care of this because the compiler output is pretty bloated right now. Here's what I've been testing with. I'm not sure what I'm missing... (define_insn "zero_extendqisi" [(set (match_operand:SI 0 "register_operand" "=r") (zero_extend:SI (match_operand:QI 1 "register_operand" "r")))] "" "; ZERO EXTEND (comment for debugging)") (define_expand "movqi" [(set (match_operand:QI 0 "general_operand" "") (match_operand:QI 1 "general_operand" ""))] "" " { /* If this is a store, force the value into a register. */ if (MEM_P (operands[0])) operands[1] = force_reg (QImode, operands[1]); }") (define_insn "*movqi" [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,W,A,r,r,B,r") (match_operand:QI 1 "moxie_general_movsrc_operand" "O,r,i,r,r,W,A,r,B"))] "register_operand (operands[0], QImode) || register_operand (operands[1], QImode)" "@ xor%0, %0 mov%0, %1 ldi.b %0, %1 st.b %0, %1 sta.b %0, %1 ld.b %0, %1 lda.b %0, %1 sto.b %0, %1 ldo.b %0, %1" [(set_attr "length" "2,2,6,2,6,2,6,6,6")]) Thanks! AG
Re: Help needed with zero/sign extension
On 2 April 2014 13:08, Anthony Green wrote: > I though the answer was to simply add something like this... > > (define_insn "zero_extendqisi" > [(set (match_operand:SI 0 "register_operand" "=r") > (zero_extend:SI (match_operand:QI 1 "register_operand" "r")))] > "" > "; ZERO EXTEND (comment for debugging)") That pattern is obviously not outputting valid code. You should make this a define_insn_and_split, with an r/r alternative that is split (after reload) as necesary into shifts, and an m/r alternative that outputs a load. sprinkle with rtx_cost adjustments as necessary. > But nothing changes in the example above. LOAD_EXTEND_OP can also avoid some unnecesary expansions. ALthough we still have a long-standing issue of unnecessary extensions for narrow integer types passed in/out of functions, and loaded from volatile memory.
Re: WPA stream_out form & memory consumption
On 03/27/2014 10:48 AM, Martin Liška wrote: Previous patch is wrong, I did a mistake in name ;) Martin On 03/27/2014 09:52 AM, Martin Liška wrote: On 03/25/2014 09:50 PM, Jan Hubicka wrote: Hello, I've been compiling Chromium with LTO and I noticed that WPA stream_out forks and do parallel: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02621.html. I am unable to fit in 16GB memory: ld uses about 8GB and lto1 about 6GB. When WPA start to fork, memory consumption increases so that lto1 is killed. I would appreciate an --param option to disable this WPA fork. The number of forks is taken from build system (-flto=9) which is fine for ltrans phase, because LD releases aforementioned 8GB. What do you think about that? I can take a look - our measurements suggested that the WPA memory will be later dominated by ltrans. Perhaps Chromium does something that makes WPA to explode that would be interesting to analyze. I did not managed to get through Chromium LTO build process recently (ninja builds are not my friends), can you send me the instructions? Honza Thanks, Martin There are instructions how can one build chromium with LTO: 1) install depot-tools and export PATH variable according to guide: http://www.chromium.org/developers/how-tos/install-depot-tools 2) Checkout source code: gclient sync; cd src 3) Apply patch (enables system gold linker and disables LTO for a sandbox that uses top-level asm) 4) which ld should point to ld.gold 5) unsure that ld.bfd points to ld.bfd 6) run: build/gyp_chromium -Dwerror= 7) ninja -C out/Release chrome -jX If there are any problems, follow: https://code.google.com/p/chromium/wiki/LinuxBuildInstructions Martin Hello, taking latest trunk gcc, I built Firefox and Chromium. Both projects compiled without debugging symbols and -O2 on an 8-core machine. Firefox: -flto=9, peak memory usage (in LTRANS): 11GB Chromium: -flto=6, peak memory usage (in parallel WPA phase ): 16.5GB For details please see attached with graphs. The attachment contains also -fmem-report and -fmem-report-wpa. I think reduced memory footprint to ~3.5GB is a bit optimistic: http://gcc.gnu.org/gcc-4.9/changes.html Is there any way we can reduce the memory footprint? Attachment (due to size restriction): https://drive.google.com/file/d/0B0pisUJ80pO1bnV5V0RtWXJkaVU/edit?usp=sharing Thank you, Martin
Access Error in classify_object_over_fdes on sparc-rtems
Hi I am sure this is a simple mistake in our linker script but what magic incantation, symbols, sections, end marker, etc. are assumed to be properly constructed before this method works? A pointer to the right magic in the standard sparc-elf linker script would likely be sufficient for me to fix this. An explanation of what it is accomplishing would also be appreciated. Thanks. -- Joel Sherrill, Ph.D. Director of Research & Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available(256) 722-9985
HARD_REGNO_CALL_PART_CLOBBERED and regs_invalidated_by_call
Hi Richard, As part of implementing the new O32 FPXX ABI I am making use of the HARD_REGNO_CALL_PART_CLOBBERED macro to allow odd-numbered floating-point registers to be considered as 'normally' callee-saved but call clobbered if they are being used to hold SImode or SFmode data. The macro is implemented as: /* Odd numbered single precision registers are not considered call saved for O32 FPXX as they will be clobbered when run on an FR=1 FPU. */ #define HARD_REGNO_CALL_PART_CLOBBERED(REGNO, MODE) \ (TARGET_FLOATXX && ((MODE) == SFmode || (MODE) == SImode) \ && FP_REG_P (REGNO) && (REGNO & 1)) IRA and LRA appear to work correctly w.r.t. HARD_REGNO_CALL_PART_CLOBBERED and I get the desired O32 FPXX ABI behaviour. However when writing a number of tests for this I triggered some optimisations (in particular regcprop) which ignored the fact that the odd-numbered single-precision registers are clobbered across calls and essentially undid the work IRA/LRA did in treating the register as clobbered. The reason for regcprop ignoring the call-clobbered nature of these registers is that it simply does not check. The test for call-clobbered registers solely relies on regs_invalidated_by_call which is (and cannot be) aware of the HARD_REGNO_CALL_PART_CLOBBERED macro as it has no information about what mode registers are in when it is used. A proposed fix is inline below for this specific issue. diff --git a/gcc/regcprop.c b/gcc/regcprop.c index 101de76..cb2937c 100644 --- a/gcc/regcprop.c +++ b/gcc/regcprop.c @@ -1030,8 +1030,10 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd) } } - EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call, 0, regno, hrsi) - if (regno < set_regno || regno >= set_regno + set_nregs) + for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) + if ((TEST_HARD_REG_BIT (regs_invalidated_by_call, regno) +|| HARD_REGNO_CALL_PART_CLOBBERED (regno, vd->e[regno].mode)) + && (regno < set_regno || regno >= set_regno + set_nregs)) kill_value_regno (regno, 1, vd); /* If SET was seen in CALL_INSN_FUNCTION_USAGE, and SET_SRC The problem is that there are several other passes that solely rely on regs_invalidated_by_call to determine call-clobbered status and will therefore make the mistake. Some of these passes simply don't have mode information around when handling call-clobbered registers which leaves me a little unsure of the best solution in those cases. I would expect that being over-cautious and always marking a potentially clobbered register as clobbered seems like one option but there is a risk that doing so could lead to legitimate use of a callee-saved register (in a mode that is not part clobbered) to be broken. Essentially I would propose introducing another register set 'regs_maybe_invalidated_by_call' that includes all reg_invalidated_by_call and anything HARD_REGNO_CALL_PART_CLOBBERED reports true for when checking all registers against all modes. Wherever call-clobbered information is required but mode information is unavailable then regs_maybe_invalidated_by_call would then be used. As I said though there are probably some corner cases to handle too. I don't quite have the O32 FPXX patches ready to send out yet but this issue is relevant to all architectures using HARD_REGNO_CALL_PART_CLOBBERED, presumably nobody has hit it yet though. Regards, Matthew
Re: Help needed with zero/sign extension
On 04/02/14 06:08, Anthony Green wrote: One embarrassing feature of the moxie compiler port is that it really doesn't understand how to promote integral types. Moxie cores zero-extend all loads, but the compiler still shifts loaded values back and forth to zero out the upper bits. I'm a bit surprised LOAD_EXTEND_OP doesn't cover this for you. Maybe other aspects of the moxie are getting in the way: (insn 7 6 8 2 (set (reg:SI 32) (const_int 24 [0x18])) j.c:4 19 {*movsi} (nil)) (insn 8 7 10 2 (set (reg:SI 30 [ D.1371 ]) (ashift:SI (subreg:SI (mem:QI (reg:SI 2 $r0 [ c ]) [0 *c_2(D)+0 S1 A8]) 0) (reg:SI 32))) j.c:4 14 {ashlsi3} (expr_list:REG_DEAD (reg:SI 2 $r0 [ c ]) (nil))) (note 10 8 15 2 NOTE_INSN_DELETED) (insn 15 10 16 2 (set (reg/i:SI 2 $r0) (lshiftrt:SI (reg:SI 30 [ D.1371 ]) (reg:SI 32))) j.c:5 16 {lshrsi3} (expr_list:REG_DEAD (reg:SI 32) (expr_list:REG_DEAD (reg:SI 30 [ D.1371 ]) (nil Looks problematical. The shift count is used twice, so combine's going to have a bit of a tough time with this. Perhaps allow constant shift counts then force them into registers after combine with splitters? jeff
Re: Help needed with zero/sign extension
Joern Rennecke writes: > On 2 April 2014 13:08, Anthony Green wrote: > >> I though the answer was to simply add something like this... >> >> (define_insn "zero_extendqisi" >> [(set (match_operand:SI 0 "register_operand" "=r") >> (zero_extend:SI (match_operand:QI 1 "register_operand" "r")))] >> "" >> "; ZERO EXTEND (comment for debugging)") > > That pattern is obviously not outputting valid code. It's actually just a valid assembler comment so I can see if the pattern is used. > You should make this a define_insn_and_split, with an r/r alternative > that is split (after reload) as necesary into shifts, and an m/r alternative > that outputs a load. sprinkle with rtx_cost adjustments as necessary. Thanks for the tip. I have it working now! AG
Re: [patch] Fix texinfo warnings for doc/gcc.texi [was: Re: doc bugs]
*PING* Tobias Burnus wrote: H.J. Lu wrote: On Fri, Mar 28, 2014 at 12:41 PM, Mike Stump wrote: Since we are nearing release, I thought I'd mention I see: ../../gcc/gcc/doc/invoke.texi:1114: warning: node next `Overall Options' in menu `C Dialect Options' and in sectioning `Invoking G++' differ http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59055 I think one reason that there are (and were) that many warnings is that only recently texinfo gained support for diagnosing these issues. (Or maybe not that recent but distributions were slow in adapting newer texinfo versions.) Attached is a warning-removal patch. OK for the trunk? Regarding invoke.texi: It had (nearly) the same @menu twice, once under @chapter where it belongs to and once under a @section where it doesn't. Tobias
Re: Help needed with zero/sign extension
Jeff Law writes: > On 04/02/14 06:08, Anthony Green wrote: >> >> One embarrassing feature of the moxie compiler port is that it really >> doesn't understand how to promote integral types. Moxie cores >> zero-extend all loads, but the compiler still shifts loaded values back >> and forth to zero out the upper bits. > I'm a bit surprised LOAD_EXTEND_OP doesn't cover this for you. Maybe > other aspects of the moxie are getting in the way: > > > (insn 7 6 8 2 (set (reg:SI 32) > (const_int 24 [0x18])) j.c:4 19 {*movsi} > (nil)) > (insn 8 7 10 2 (set (reg:SI 30 [ D.1371 ]) > (ashift:SI (subreg:SI (mem:QI (reg:SI 2 $r0 [ c ]) [0 > *c_2(D)+0 S1 A8]) 0) > (reg:SI 32))) j.c:4 14 {ashlsi3} > (expr_list:REG_DEAD (reg:SI 2 $r0 [ c ]) > (nil))) > (note 10 8 15 2 NOTE_INSN_DELETED) > (insn 15 10 16 2 (set (reg/i:SI 2 $r0) > (lshiftrt:SI (reg:SI 30 [ D.1371 ]) > (reg:SI 32))) j.c:5 16 {lshrsi3} > (expr_list:REG_DEAD (reg:SI 32) > (expr_list:REG_DEAD (reg:SI 30 [ D.1371 ]) > (nil > > > Looks problematical. The shift count is used twice, so combine's > going to have a bit of a tough time with this. > > Perhaps allow constant shift counts then force them into registers > after combine with splitters? Rather than use shifts, I've added "sign-extend byte" and "sign-extend short" instructions (I have the luxury of a soft-core architecture with a tiny user base :). Switching char to unsigned by default was also a good thing given zero-extend by default. I've tested RTEMS apps on QEMU, so it's all good so far. I'll submit patches tonight. AG
Re: -fleading-underscore is not working as expected.
On Wed, Apr 2, 2014 at 2:15 AM, Umesh Kalappa wrote: > > Was enabled the switch "-fleading-underscore" to emit the global > symbol name with prefix _ . > ld HL, (a) > > ld DE, (b) > > add DE, HL > > ld (c), DE > > cal _tes > > ld DE, (c) > > ld WA, DE > > ret > > > if you see the asm ,the global symbol names was prefixed with _ in the > definition ,But not in the uses. > > I'm sure we are missing something here w.r.t -fleading-underscore flag > and gcc source is 4.8.1. You didn't mention which target you are using, and I don't recognize it. If this is a private target, your backend files are missing some uses of %U (%U is a directive for asm_fprintf). Ian
collect2 "-o" argument position problem
Hello guys, I don't know whether this is the best place to ask for this, but anyway, here we go: I have two different commandlines for collect2 (I got them after using -v in gcc) and I found out that the original one does not work because of the position in the parameter list. Error: /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/collect2 --sysroot=/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot --eh-frame-hdr -m elf_i386 -dynamic-linker -o conftest /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crt1.o /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crti.o /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtbegin.o -L/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/lib -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/lib -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib conftest.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtend.o /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crtn.o /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin/ld: cannot find conftest: No such file or directory No error: /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/collect2 --sysroot=/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot --eh-frame-hdr -m elf_i386 -dynamic-linker /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crt1.o /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crti.o /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtbegin.o -L/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/lib -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/lib -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib conftest.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtend.o /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crtn.o -o conftest Any idea on why the parameter parsing fails? Is it a problem or is it the expected behavior? Thanks a lot, David
Re: WPA stream_out form & memory consumption
> > Hello, > taking latest trunk gcc, I built Firefox and Chromium. Both > projects compiled without debugging symbols and -O2 on an 8-core > machine. > > Firefox: > -flto=9, peak memory usage (in LTRANS): 11GB > > Chromium: > -flto=6, peak memory usage (in parallel WPA phase ): 16.5GB I see, the ltrans memory use is however about the same later in the game. > > For details please see attached with graphs. The attachment contains > also -fmem-report and -fmem-report-wpa. > I think reduced memory footprint to ~3.5GB is a bit optimistic: > http://gcc.gnu.org/gcc-4.9/changes.html I will need to re-measure my setup - it is what I got last time with basically same configuration. It depends on parallelism, you should get sub 4GB peak with -flto=1, right? We should clarify this in changes.html. > > Is there any way we can reduce the memory footprint? Looking at the memreport we get for ggc memory: Chromium: cgraph.c:869 (cgraph_create_edge_1) 0: 0.0% 0: 0.0% 274319552: 4.8% 0: 0.0%2637688 cgraph.c:510 (cgraph_allocate_node) 0: 0.0% 0: 0.0% 426228128: 7.5% 0: 0.0%1299476 toplev.c:960 (realloc_for_line_map) 0: 0.0% 357908640: 3.8% 1073743896:18.8%184: 0.0% 10 tree-streamer-in.c:621 (streamer_alloc_tree) 216054000:86.6% 7623611824:80.2% 2536849136:44.5% 57818592:36.0% 69421368 Total 249562346 9504578411 5700671942160593619 97146243 source location GarbageFreed Leak OverheadTimes Firefox: cgraph.c:869 (cgraph_create_edge_1) 0: 0.0% 0: 0.0% 130358176: 6.9% 0: 0.0%1253444 cgraph.c:510 (cgraph_allocate_node) 0: 0.0% 0: 0.0% 182236800: 9.7% 0: 0.0% 555600 toplev.c:960 (realloc_for_line_map) 0: 0.0% 89503888: 5.5% 268468240:14.3%160: 0.0% 13 tree-streamer-in.c:621 (streamer_alloc_tree) 93089976:77.5% 972848816:59.6% 639230248:33.9% 21332480:32.3% 13496198 Total 120076578 1632997043 1883064062 65981723 24732501 source location GarbageFreed Leak OverheadTimes So chromium uses quite a lot more trees and also seem to have about twice as many functions. Next time, it is useful to include -Q while collecting the data - it shows individual GGC runs and also memory usage accounted per pass. That way we would know if there are a lot more functions to start with, or just more inlining going on. I have older patch that introduces cache to line map stremaing reducing its size quite a bit, that should save some memory of realloc_for_line_map. I will dig it out and update to current tree. I also wonder where the rest of memory goes, since the graphs shows about 10GB for Firefox. Some is probably accounting of mmap files, also gold's memory usage. We collect only some of memory usage that is not in ggc. Vectors: Chromium: ipa-cp.c:2421 (grow_edge_clone_vectors)17225752: 6.9% 17225752 1: 0.0% vec.h:1393 (copy) 17291228: 6.9% 100465316 1499009: 3.7% lto-cgraph.c:141 (lto_symtab_encoder_encode) 30436272:12.2% 53192752 1460: 0.0% passes.c:2254 (execute_one_pass) 53853360:21.6% 83885960 1426939: 3.5% ipa-inline-analysis.c:974 (inline_summary_alloc) 84406056:33.8% 137806000 484472: 1.2% Total 249721648 40747241 Firefox: ipa-cp.c:2421 (grow_edge_clone_vectors) 7753312: 6.1%7753312 1: 0.0% ipa-inline-analysis.c:4053 (read_inline_edge_sum8758216: 6.9% 26420804 909584: 4.9% ipa-ref.c:54 (ipa_record_reference)10747880: 8.4% 20943288 371083: 2.0% lto-cgraph.c:141 (lto_symtab_encoder_encode) 19756008:15.5% 23548272 1335: 0.0% passes.c:2254 (execute_one_pass) 26769688:21.0% 41942904 716378: 3.9% ipa-inline-analysis.c:974 (inline_summary_alloc) 40110248:31.5% 62026480 284283: 1.5% Total 127480444 18430703 that seems as usual. 249MB seems acceptable. Bitmaps seems to be dominated by ipa-reference. On Chromium this pass seems to go crazy, having about 80MB of bitmaps. Perhaps you could try to get data with -fno-ipa-reference? We ought to get stats on hashtables, since these probably consume quite some memory during LTO streaing. Could you perhaps also get -flto-report? Honza
Re: collect2 "-o" argument position problem
On Wed, Apr 2, 2014 at 3:26 PM, David Guillen wrote: > Hello guys, > > I don't know whether this is the best place to ask for this, but > anyway, here we go: > > I have two different commandlines for collect2 (I got them after using > -v in gcc) and I found out that the original one does not work because > of the position in the parameter list. The simple answer is -dynamic-linker takes an operand. So in the first case, the operand to dynamic-linker is -o and in the second case it is /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crt1.o . Both seems wrong. How is GCC being invoked here? Do you have -Wl,-dynamic-linker on the command line? Thanks, Andrew > > > Error: > /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/collect2 > --sysroot=/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot > --eh-frame-hdr -m elf_i386 -dynamic-linker -o conftest > /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crt1.o > /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crti.o > /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtbegin.o > -L/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc > -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin > -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/lib > -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/lib > -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib > conftest.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc > --as-needed -lgcc_s --no-as-needed > /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtend.o > /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crtn.o > /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin/ld: > cannot find conftest: No such file or directory > > No error: > /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/collect2 > --sysroot=/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot > --eh-frame-hdr -m elf_i386 -dynamic-linker > /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crt1.o > /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crti.o > /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtbegin.o > -L/home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc > -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/bin > -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/lib > -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/lib > -L/home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib > conftest.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc > --as-needed -lgcc_s --no-as-needed > /home/david/uclibc/uclibc-buildroot-custom/output/build/host-gcc-final-gcc-4_7_3-release/build/./gcc/crtend.o > /home/david/uclibc/uclibc-buildroot-custom/output/host/usr/i686-buildroot-linux-gnu/sysroot/usr/lib/crtn.o > -o conftest > > > Any idea on why the parameter parsing fails? Is it a problem or is it > the expected behavior? > > Thanks a lot, > David
Re: WPA stream_out form & memory consumption
> Previous email presents a bit misleading graphs (influenced by > --enable-gather-detailed-mem-stats). > > Firefox: > -flto=9, WPA peak: 8GB, LTRANS peak: 8GB > -flto=4, WPA peak: 5GB, LTRANS peak: 3.5GB > -flto=1, WPA peak: 3.5GB, LTRANS peak: ~1GB > > These data shows that parallel WPA streaming increases short-time > memory footprint by 4.5GB for -flto=9 (respectively by 1.5GB in case > of -flto=4). > > For more details, please see the attachment. Aha, --enable-gather-detailed-mem-stats maintains on-side hashtable tracking all ggc allocations so it almost doubles memory use. That explains the disproportions in between GGC use and your graphs. Can you, perhaps, also get chromium graphs without detailed stats? Honza > > Martin
Re: collect2 "-o" argument position problem
On 2 April 2014 23:26, David Guillen wrote: > Hello guys, > > I don't know whether this is the best place to ask for this, gcc-h...@gcc.gnu.org would have been better :-)