On Tue, 2013-05-21 at 14:40 +0200, Andreas Krebbel wrote: > Hi, > > I'm currently implementing support for hardware transactional memory > in the S/390 backend and ran into a problem with saving and restoring > the floating point registers. > > On S/390 the tbegin instruction starts a transaction. If a subsequent > memory access collides with another the transaction is aborted. The > execution then continues *after* the tbegin instruction. All memory > writes after the tbegin are rolled back, the general purpose registers > selected in the tbegin operand are restored, and the condition code is > set in order indicate that an abort occurred. What the code then is > supposed to do is to check the condition code and either jump back to > the transaction if it is a temporary failure or provide an alternate > implementation using e.g. a lock. > > Unfortunately our tbegin instruction does not save the floating point > registers leaving it to the compiler to make sure the old values get > restored. This will be necessary if the abort code relies on these > values and the transaction body modifies them.
You could also start with supporting s390 HTM through the transactional language constructs we already support (__transaction_atomic etc.) and libitm. The advantage would be that you can reuse quite a few bits of existing machinery (e.g., different fallbacks when the HTM can't execute a certain transaction, some analyses on the compilation side); however, this doesn't give programmers as much control as if using the HTM directly, and it requires a function call on begin and commit when using the current libitm ABI. (I know that this is kind of a side note, because you seem to be looking for a way to expose this at the granularity of HTM begin/commit builtins (e.g., to base lock elision implementations on top of it); but I think that in the long run txnal language constructs are easier for many users.) > With my current approach I try to place FPR clobbers to trigger GCC > generating the right save/restore operations. This has some > drawbacks: > > - Bundling the clobbers with the tbegin causes FPRs to be restored > even in the good path (the transaction never aborts). > > - Placing the clobbers on the abort path kinda works. However it is > not really correct. GCC could decide to wrap the save/restore > operations just around the clobbers what would be wrong. A solution > to that might be to (that's what I'm currently working on): > > - Bundle the tbegin with the condtional jump to the abort code in > order to prevent GCC from saving the FPRs right after the tbegin. > > - Direct an abnormal edge to the abort code to tell GCC that the > FPRs are actually clobbered from somewhere outside (as with EH). > > Does this sound reasonable? > > The point is that not all the execution paths through tbegin > actually clobber FPRs. It is only true for the paths which lead to > the abort code in the end. So another solution might be to > implement support for conditional clobbers. Clobbers wrapped into a > cond_exec perhaps. I'm not sure how difficult this would be to > implement and whether it would be worth it?! > > > > This also has implications for the ABI and the prologue/epilogue > generation. Consider a function with just a tbegin: > int foo () { return __builtin_tbegin (); } > > foo needs to save and restore *all* the call-saved FPRs since the > transaction body continuing in the caller of foo might modify a > call-saved FPR and trigger an abort. If foo would not save and > restore the FPRs it could end up clobbering call-saved FPRs violating > the ABI. > > (Note: Be aware that since transactions roll back all memory > operations this also applies to stack manipulations. So with a > function like foo above it will happen that during an abort you return > to a callee which already returned. The stack frame of foo will be > restored by the transaction. So compared to setjmp/longjmp jumping to > a callee is supposed to work reliably even if the stack content of the > callee has been clobbered in between.) > > The additional prologue/epilogue FPR backups for TXs can only be > avoided if the transaction is fully contained in the function body > (and does not use the FPRs). I call these non-escaping transactions. That's what __transaction_atomic etc. give you. I believe we already check whether we need to save/restore vector registers, but I guess we're not checking for FPRs. > I've implemented a check which deals with the most common situations > using the post-dominance tree. If all the tbegin BBs are > post-dominated by a tend BB I redo the df_regs_ever_live computation > from scratch after reload removed the clobbers. But this > unfortunately doesn't help with TX instructions being used as part of > a library like with libitm. In libitm, it's probably easier to write custom assembly code for ITM_beginTransaction that saves/restores all additional bits not restored by the HTM explicitly through a partial SW setjmp. This approach at least worked well for AMD's ASF, which didn't even restore all normal registers. Torvald