https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117924

--- Comment #1 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
looking at dse3 dump we get:

  <bb 2> [local count: 1073741824]:
  MEM[(struct _Bvector_impl_data *)&data] ={v} {CLOBBER(bob)};
  MEM[(struct __as_base  &)&data] ={v} {CLOBBER(bob)};
  _13 = MEM[(const struct vector
*)src_2(D)].D.25603._M_impl.D.25087._M_start.D.16487._M_p;
  _14 = MEM[(const struct _Bit_iterator &)src_2(D) + 16].D.16487._M_offset;
  _15 = MEM[(const struct _Bit_iterator &)src_2(D) + 16].D.16487._M_p;
  _16 = _15 - _13;
  _17 = _16 * 8;
  _18 = (long int) _14;
  _19 = _17 + _18;
  _20 = (long unsigned int) _19;
  if (_20 != 0)  
    goto <bb 3>; [33.00%]
  else
    goto <bb 22>; [67.00%]

  <bb 22> [local count: 719407024]:
  goto <bb 5>; [100.00%]

  <bb 3> [local count: 354334800]:
  _25 = _20 + 63;
  _26 = _25 >> 6;
  _45 = _26 * 8;
  _46 = operator new (_45);


If I am reading it right, then _20 is size of the vector in bits.
<bb 3> then takes the size in bits to determine size in 64bit words which is
then converted to size in bytes.

_16 already holds size of allocated vector. I think this all can be simplified
to using it and adding 1 if offset is non-0.

Later we do:
  <bb 6> [local count: 966367640]:
  _54 = (long unsigned int) _16;
  __builtin_memcpy (data$_M_p_173, _13, _54);

Here _16 bytes is copied, so the last word must (if offset is non-0) must be
copied separately:

  if (_14 == 0)
    goto <bb 23>; [10.20%]
  else
    goto <bb 24>; [89.80%]

  <bb 24> [local count: 867798143]:
  _55 = data$_M_p_173 + _54;
  goto <bb 11>; [100.00%]

....

 <bb 11> [local count: 964220160]:
  # __result$_M_p_40 = PHI <data$_M_p_173(26), _55(24), _57(30)>
  <bb 11> [local count: 964220160]:
  # __result$_M_p_40 = PHI <data$_M_p_173(26), _55(24), _57(30)>

  <bb 12> [local count: 9453138808]:
  # __first$_M_offset_154 = PHI <__first$_M_offset_58(21), 0(11)>
  # __first$_M_p_151 = PHI <__first$_M_p_112(21), _15(11)>
  # __result$_M_p_136 = PHI <__result$_M_p_105(21), __result$_M_p_40(11)>
  _82 = 1 << __first$_M_offset_154;
  _84 = *__first$_M_p_151;
  _85 = _82 & _84;
  pretmp_88 = *__result$_M_p_136;
  if (_85 != 0)
    goto <bb 13>; [50.00%]
  else
    goto <bb 14>; [50.00%]

this copies the last byte but takes care to zero out the unused part of vector,
why?

  <bb 13> [local count: 4726569404]:
  _90 = _82 | pretmp_88;
  goto <bb 15>; [100.00%]

  <bb 14> [local count: 4726569404]:
  _92 = ~_82;
  _93 = pretmp_88 & _92;

  <bb 15> [local count: 9453138808]:
  # cstore_138 = PHI <_90(13), _93(14)>
  *__result$_M_p_136 = cstore_138;

this seems to be computing the last byte.
It is not clear to me why it is still considred live at this point, but also
the whole code could just be replaced by memcpying the last word as well.  We
compute the size of memory allocated and we can simply copy everything, right?

Reply via email to