Hi Andrew, > Does it make any sense to simply predefine the possible valid > combinations with the HLE bit already set? it at least removes any > possible invalid combinations and forces the programmer to consciously > choose their memory model. > > ie, > __ATOMIC_HLE_XACQ_CONSUME > __ATOMIC_HLE_XACQ_ACQUIRE > __ATOMIC_HLE_XACQ_ACQ_REL > __ATOMIC_HLE_XACQ_SEQ_CST > > __ATOMIC_HLE_XREL_RELEASE > __ATOMIC_HLE_XREL_ACQ_REL > __ATOMIC_HLE_XREL_SEQ_CST > > or whatever happens to be valid... Doesn't really scale to adding more > new bits later, but perhaps that doesn't matter.
Idea sounds good to me. Certainly would be much harder to misuse, so generally be a better interface. As to what combinations make sense: An HLE region ACQUIRE has somewhat interesting ordering semantics. It's a fairly strong barrier (LOCK prefix) for reads and writes. The HLE RELEASE is either LOCK too, or MOV without LOCK. If it's running transactionally the whole block acts like a LOCK too. But we have to use the weakest. I suppose that would map to always _SEQ_CST just for most instructions, except for mov release whih can be _RELEASE too (and would need an additional MFENCE generated for anything stronger) Probably there is not a lot of value in allowing the optimizer weaker models than what the CPU does. -Andi -- a...@linux.intel.com -- Speaking for myself only.