Re: wide-int branch timings

Richard Biener Tue, 15 Oct 2013 05:49:46 -0700

On Tue, Oct 15, 2013 at 2:41 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Tue, Oct 15, 2013 at 2:10 PM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Tue, Oct 15, 2013 at 1:12 AM, Mike Stump <mikest...@comcast.net> wrote:
>>> So, here is a comparison of the time required to do a make -j15 of a 
>>> --disable-bootstrap --enable-checking=none --enable-languages=c,c++ style 
>>> compiler.  The base compiler is a --enable-checking=none 
>>> --enable-languages=c,c++,lto style compiler, which is 
>>> 1b2bf75690af8115739ebba710a44d05388c7a1a (aka trunk@202797) from git.  The 
>>> wide branch compiler is 4529820913813b810860784382f975ea8e6be61d (aka 
>>> wide-int@203462) from git.  The software compiled in both cases is the base 
>>> compiler described above.
>>>
>>> Net result, around 2.6% regression in user time, and 0.4% in elapsed time.  
>>> The raw data is below, just in case one is interested.  This is on Ubuntu 
>>> 12.04.3 system with 12GB ram with 8 cores.
>>
>> Btw, more interesting are testcases that put a heavy load on the alias
>> machinery, like (many) (nested) loops with a lot of memory references.
>> Like the testcase in PR39326.  If you profile that you will see some
>> of the double_int routines high in the profile which means on the
>> branch wide_int routines should start to show up.
>>
>> I didn't expect visible differences for a bootstrap, but you proved me
>> wrong :(  Btw, with parallel make a single file getting a lot slower can
>> be masked by parallelism completely, so I take timings with -j
>> with a grain of salt.
>
> For example for get_ref_base_and_extent the adds to bit_offset
> (even though initially of addr_wide_int kind) end up unoptimized,
> exposing
>
>   if (len_822 > 2)
>     goto <bb 96>;
>   else
>     goto <bb 94>;
>
> <bb 94>:
>   xprecision_819 = (unsigned int) D.54901_818;
>   if (xprecision_819 > 127)
>     goto <bb 96>;
>   else
>     goto <bb 95>;
>
> <bb 95>:
>   D.54899_838 = D.54922_816->base.u.bits.unsigned_flag;
>   D.54900_839 = (signop) D.54899_838;
>   len_840 = wi::force_to_size (&MEM[(struct wide_int_ref_storage
> *)&yi].scratch, val_823, len_822, xprecision_819, 128, D.54900_839);
>
> <bb 96>:
>   # val_1543 = PHI <val_823(93), &MEM[(struct wide_int_ref_storage
> *)&yi].scratch(95), val_823(94)>
>   # len_1542 = PHI <2(93), len_840(95), len_822(94)>
>   MEM[(struct generic_wide_int *)&yi].val = val_1543;
>   MEM[(struct generic_wide_int *)&yi].len = len_1542;
>   MEM[(struct generic_wide_int *)&yi].precision = 128;
>   D.54871_813 = wi::add_large (&MEM[(struct fixed_wide_int_storage
> *)&D.54875].D.43191.val, &MEM[(const struct fixed_wide_int_storage
> *)&bit_offset].val, D.54872_808, val_1543, len_1542, 128, 1, 0B);
>   MEM[(unsigned int *)&D.54875 + 24B] = D.54871_813;
>   __builtin_memcpy (&bit_offset, &D.54875, 28);
>   goto <bb 284> (<L141>);


That was built with host G++ 4.6, with trunk you see it more obvious:

  <bb 71>:
  # SR.574_214 = PHI <_507(69), &MEM[(struct wide_int_ref_storage
*)&yi].scratch(70), _507(68)>
  # SR.575_810 = PHI <len_503(69), len_502(70), len_503(68)>
  MEM[(struct generic_wide_int *)&yi] = SR.574_214;
  MEM[(struct generic_wide_int *)&yi + 8B] = SR.575_810;
  MEM[(struct generic_wide_int *)&yi + 12B] = 128;
  _468 = wi::add_large (&MEM[(struct fixed_wide_int_storage
*)&D.52085].val, &MEM[(const struct fixed_wide_int_storage
*)&bit_offset].val, _463, SR.574_214, SR.575_810, 128, 1, 0B);
  MEM[(unsigned int *)&D.52085 + 24B] = _468;
  yi ={v} {CLOBBER};
  MEM[(struct generic_wide_int *)&bit_offset] = MEM[(struct
generic_wide_int *)&D.52085];
  D.52085 ={v} {CLOBBER};
  goto <bb 277> (<L142>);

even though yi dies after the call to wi::add_large we cannot remove the
pointless initializations of its members as its address escapes.

Richard.

Re: wide-int branch timings

Reply via email to