On Wed, 2012-11-07 at 15:01 -0800, Richard Henderson wrote: > I wrote the second of these patches first, and I'm uncertain about the > desirability of the first of the patches. > > While working on the uninstrumented code path bulk patch, I noticed that > nested transactions within the copy of the outermost transaction
I assume "within" refers to within the same static scope of the function? > were > not being processed for an uninstrumented code path, and so were only > receiving an instrumented path. This is clearly less than ideal when > considering HTM. Yes, it would be better if nested transactions would have an uninstrumented code path too. > Now, it seemed to me that if we're already in an uninstrumented code > path, we're extremely likely to want to stay in one. This is certainly > true for HTM, as well as when we've selected the serialirr method. I > can't think off hand of any other reason we'd be on the uninstrumented > code path. > > Therefore the second patch arranges for all nested transactions in the > uninstrumented path to _only_ have uninstrumented paths themselves. This is only possible if the nested transaction cannot abort. If it potentially can, then we must have an instrumented code path for it. > While reviewing the results of this second patch in detail, I noticed > that nested transactions on the instrumented code paths were not > generating both instrumented and uninstrumented code paths. My first > reaction was that this was a bug, and so I wrote the first patch to > fix it. > > But as I was reviewing the patch to write the changelog, I began to > wonder whether the same logic concerning the original instrumented/ > uninstrumented choice applies as well to the instrumented path. > > Is it ever likely that we'd choose an uninstrumented path for a > nested transaction, given that we're already executing the instrumented > path for an outer transaction? It could be a performance benefit in cases like __transaction_relaxed { // something if (foo) { unsafe_action(); transaction_relaxed { // lots of code } } } Here, we go serial at unsafe_action and can then cause less overall slowdown when running "lots of code" as uninstrumented (but it's just avoiding the per-memory-access function calls). If the txn can acquire the serial lock and commit afterwards, then it doesn't have to restart at the outermost txn. I can't say whether such cases would appear often enough in practice to offset the code size increase (and thus potential slowdowns) caused by having uninstrumented code paths available too. > It now seems to me that the only time we'd switch from instrumented > to uninstrumented code path would be during a transaction restart, > after having selected to retry with a serialirr method. > > Which means that I should apply the second patch only, > > Thoughts? Another thing to consider is whether flat nesting could be used: for nested transactions that have hasNoAbort set, we can elide txn begin/commit calls and just embed the nested txn body in the enclosing txn. This would be a benefit on uninstrumented/uninstrumented code path pairs because the nested txn can't do anything but keep running. For instrumented/instrumented (with hasNoAbort set), we could also flatten but there is a performance trade-off: the TM runtime lib could do closed nesting on data conflicts, so that it tries to roll back only the inner txn on a conflict. The potential benefit of such a scheme depends a lot on the workload, and libitm currently also flattens txns that do not abort. OTOH, the overhead of the begin/commit calls for nested txns should also be relatively small compared to overall STM/instrumented overheads. So, to summarize: - If the txn could abort, then it must have instrumented and could have uninstrumented. - If the nested txn does not abort, then: - For uninstrumented / uninstrumented nesting, flat nesting makes sense (and thus we don't need instrumented for the nested). - For instrumented / instrumented, we could flatten, but unsure whether that significantly improves performance. - Nested uninstrumented could be useful with enclosing nested, but perhaps this doesn't happen often enough to offset the code size increase. Thoughts? Torvald