https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117924
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #5)
> Note there is another way of solving this. From my anylsis (which I wrote in
> PR 121921):
> currently DSE5 can remove the stores:
> ```
> Deleted dead store: MEM[(struct __as_base &)&data] ={v} {CLOBBER(bob)};
>
> Deleted dead store: MEM[(struct _Bvector_impl_data *)&data] ={v}
> {CLOBBER(bob)};
>
> ```
> But DCE7 (which is right afterwards) does not `remove operator new/delete`
> because this missed optimization and then forwprop4 (which is right after
> dce7) is able to see (b+s) - (b+s - b) is just b and then later on the next
> DCE optimizes away the new/delete pair.
>
> > Unused new/delete pair is only being determined at cddce3 which is bit
> > late.
>
> The reason why it is not before hand is due to `e - (e - b)` not being
> optimized to b until forwprop4 which is right after dce7. If `e - (e - b)`
> got folded say fre1:
> ```
> _1 = this_15(D)->_M_impl.D.25104._M_start.D.16464._M_p;
> ...
> _20 = MEM[(const struct _Bvector_impl
> *)this_15(D)].D.25104._M_end_of_storage;
> _5 = _20 - _1; // e - b
> _8 = (long unsigned int) _5;
> _9 = -_8;
> _10 = _20 + _9; // e - (e - b)
> _11 = &this_15(D)->_M_impl;
> operator delete (_10, _8);
> ```
> We should recongize the operator new/delete pair earlier too.
Nope because we are till left with:
```
_133 = _34 + _33;
...
_9 = _133 - _34;
_10 = (long unsigned int) _9;
```
Not being converted into _33 until forwprop still.
The reason is fre5 does not get it due to the need for jump threading:
```
<bb 8> [local count: 111448560]:
# _150 = PHI <_34(7), 0B(4), _34(6)>
# data$D25093$_M_end_of_storage_175 = PHI <_28(7), 0B(4), _28(6)>
__first ={v} {CLOBBER(eos)};
__result ={v} {CLOBBER(eos)};
if (_150 != 0B)
goto <bb 10>; [53.47%]
else
goto <bb 11>; [46.53%]
...
<bb 10> [local count: 58514395]:
_9 = data$D25093$_M_end_of_storage_175 - _150;
```
In theory we could optimize:
```
_28 = _34 + _33;
...
<bb 10> [local count: 111448560]:
# __result_72 = PHI <_69(7), _34(8), _71(9), 0B(4)>
# _150 = PHI <_34(7), _34(8), _34(9), 0B(4)>
# data$D25093$_M_end_of_storage_175 = PHI <_28(7), _28(8), _28(9), 0B(4)>
...
_9 = data$D25093$_M_end_of_storage_175 - _150;
_10 = (long unsigned int) _9;
Into:
```
<bb 10> [local count: 111448560]:
# _t = PHI<_33(7),_33(8),_33(9),0>
...
_9 = (long int)_t
_10 = (long unsigned int) _9;
...
```
But I am not sure how expensive in compile time this would be. Then in ccp4 we
would get the decent code.