On Tue, Mar 12, 2024 at 01:47:53PM +0100, Richard Biener wrote: > > Admittedly the above is the ugliest part of the patch IMHO. > > It isn't needed in all cases, but e.g. for the pr112709-2.c (qux) case > > we have before ubsan pass > > # _7(ab) = PHI <_20(9), _8(ab)(11)> > > _22 = bar (*_7(ab)); > > and want to instrument the *_7(ab) load among the parameters for object > > size. > > So, it wants to add > > _2 = __builtin_dynamic_object_size (_7(ab), 0); > > sequence and later: > > .UBSAN_OBJECT_SIZE (_7(ab), 1024, _2, 0); > > before the call insn. If it isn't a noreturn call, it would just > > add those statements into the same basic block using gsi_insert*before. > > But because it is a returns_twice call, it needs to be done in a different > > block. And having all of ubsan/asan/bitintlower find that out first and > > figure out that it should use _20 instead of _7(ab) seems hard, especially > > that for cases like: > > I would have expected the result as > > # _7 = PHI <_20(9)> > _2 = __builtin_dynamic_object_size (_7, 0); > .UBSAN_OBJECT_SIZE (_7(ab), 1024, _2, 0); > // fallthru > > # _99(ab) = PHI <_7, _8(ab)(11)> > _22 = bar (*_99(ab)); > > where all uses of _7 in the IL before splitting get replaced via > their immediate uses but the to-be inserted IL can continue using > the defs (that requires the not inserted IL to not have update_stmt > called on them, of course). > > I think split_block would have the PHIs organized that way already, > you are adding the _99 defs anyway, right? That is, > > + tree new_lhs = copy_ssa_name (lhs); > + gimple_phi_set_result (phi, new_lhs); > + gphi *new_phi = create_phi_node (lhs, other_edge->dest); > > here you swap the defs, I'd omit that. And instead essentially do > replace_uses_by (lhs, new_lhs) here (without the folding and stuff). > You have to clear SSA_NAME_OCCURS_IN_ABNORMAL_PHI on the old > and set it on the new LHS if it was set. > > I think that should contain the "SSA update" to > edge_before_returns_twice_call for the case it splits the block?
I have considered that, but there are problems with that. The most important is that in the most common case, there is no block splitting nor any edge splitting, all we have is that # _99(ab) = PHI <_7(6), _8(ab)(11)> _22 = bar (*_99(ab)); or similar. And we can do several insertions of statements or sequences before the bar call. If it is not returns_twice or even not a call, we want to use _99(ab) in there and insert just before the call/statement. If it is returns_twice call, we want to replace the _99(ab) uses with something that is actually usable on the 6->10 edge. So, if we wanted to keep using the _99(ab) (wouldn't it be even wrong that it is (ab)?), we'd need to essentially on every addition add statements like _99(ab) = _7; (for each PHI node except the virtual one) before the added statement or statements (or add it only if used in the statement/sequence?) and change the PHI to be # _100(ab) = PHI <_99(ab)(6), _8(ab)(11)> and replace all uses: _22 = bar (*_100(ab)); Right now ubsan can insert something like 3 statements/sequences per argument, so there can be thousands of them and the block could have thousands of PHIs, so that is millions of useless SSA_NAME copies. So, the intention of edge_before_returns_twice_call is just that it in the common case just finds the non-EDGE_ABNORMAL edge if there is one, if there isn't just one, it adjusts the IL such that there is just one. And then the next step is to handle that case. Jakub