On Tue, Oct 15, 2013 at 2:41 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, Oct 15, 2013 at 2:10 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Tue, Oct 15, 2013 at 1:12 AM, Mike Stump <mikest...@comcast.net> wrote: >>> So, here is a comparison of the time required to do a make -j15 of a >>> --disable-bootstrap --enable-checking=none --enable-languages=c,c++ style >>> compiler. The base compiler is a --enable-checking=none >>> --enable-languages=c,c++,lto style compiler, which is >>> 1b2bf75690af8115739ebba710a44d05388c7a1a (aka trunk@202797) from git. The >>> wide branch compiler is 4529820913813b810860784382f975ea8e6be61d (aka >>> wide-int@203462) from git. The software compiled in both cases is the base >>> compiler described above. >>> >>> Net result, around 2.6% regression in user time, and 0.4% in elapsed time. >>> The raw data is below, just in case one is interested. This is on Ubuntu >>> 12.04.3 system with 12GB ram with 8 cores. >> >> Btw, more interesting are testcases that put a heavy load on the alias >> machinery, like (many) (nested) loops with a lot of memory references. >> Like the testcase in PR39326. If you profile that you will see some >> of the double_int routines high in the profile which means on the >> branch wide_int routines should start to show up. >> >> I didn't expect visible differences for a bootstrap, but you proved me >> wrong :( Btw, with parallel make a single file getting a lot slower can >> be masked by parallelism completely, so I take timings with -j >> with a grain of salt. > > For example for get_ref_base_and_extent the adds to bit_offset > (even though initially of addr_wide_int kind) end up unoptimized, > exposing > > if (len_822 > 2) > goto <bb 96>; > else > goto <bb 94>; > > <bb 94>: > xprecision_819 = (unsigned int) D.54901_818; > if (xprecision_819 > 127) > goto <bb 96>; > else > goto <bb 95>; > > <bb 95>: > D.54899_838 = D.54922_816->base.u.bits.unsigned_flag; > D.54900_839 = (signop) D.54899_838; > len_840 = wi::force_to_size (&MEM[(struct wide_int_ref_storage > *)&yi].scratch, val_823, len_822, xprecision_819, 128, D.54900_839); > > <bb 96>: > # val_1543 = PHI <val_823(93), &MEM[(struct wide_int_ref_storage > *)&yi].scratch(95), val_823(94)> > # len_1542 = PHI <2(93), len_840(95), len_822(94)> > MEM[(struct generic_wide_int *)&yi].val = val_1543; > MEM[(struct generic_wide_int *)&yi].len = len_1542; > MEM[(struct generic_wide_int *)&yi].precision = 128; > D.54871_813 = wi::add_large (&MEM[(struct fixed_wide_int_storage > *)&D.54875].D.43191.val, &MEM[(const struct fixed_wide_int_storage > *)&bit_offset].val, D.54872_808, val_1543, len_1542, 128, 1, 0B); > MEM[(unsigned int *)&D.54875 + 24B] = D.54871_813; > __builtin_memcpy (&bit_offset, &D.54875, 28); > goto <bb 284> (<L141>);
That was built with host G++ 4.6, with trunk you see it more obvious: <bb 71>: # SR.574_214 = PHI <_507(69), &MEM[(struct wide_int_ref_storage *)&yi].scratch(70), _507(68)> # SR.575_810 = PHI <len_503(69), len_502(70), len_503(68)> MEM[(struct generic_wide_int *)&yi] = SR.574_214; MEM[(struct generic_wide_int *)&yi + 8B] = SR.575_810; MEM[(struct generic_wide_int *)&yi + 12B] = 128; _468 = wi::add_large (&MEM[(struct fixed_wide_int_storage *)&D.52085].val, &MEM[(const struct fixed_wide_int_storage *)&bit_offset].val, _463, SR.574_214, SR.575_810, 128, 1, 0B); MEM[(unsigned int *)&D.52085 + 24B] = _468; yi ={v} {CLOBBER}; MEM[(struct generic_wide_int *)&bit_offset] = MEM[(struct generic_wide_int *)&D.52085]; D.52085 ={v} {CLOBBER}; goto <bb 277> (<L142>); even though yi dies after the call to wi::add_large we cannot remove the pointless initializations of its members as its address escapes. Richard.