On 7/2/15 4:30 AM, Alan Lawrence wrote:
Hi, pleased to meet you :)
Likewise. :-) [Abe wrote:]
* Always safe for stores, sometimes a little risky for loads: speculative loads might cause multithreaded programs with insufficient locking to fail due to writes by another thread being "lost"/"missed", even though the same program works OK "by luck" when compiled without if-conversion of loads. This risk comes mainly/only from what the relevant literature calls a "half hammock": an "if" with a "then" section but no "else" section [or effectively vice-versa, e.g. an empty "then" and a non-empty "else"]. In this case, e.g. "if (c) X[x] = Y[y];" with no attached "else" section is risky to fully if-convert in the event of the code being compiled running multithreaded and not having been written with all the locking it really needs. Respectively, e.g. "if (c) ; /* empty ''then'' */ else X[x] = Y[y];".
[Alan wrote:]
For the unenlightened, can you outline the problem with this code sequence? (i.e. the expected transformation that makes it unsafe!?) I would hope your scratchpad patch would turn this into something like
a1 = c ? &Y[y] : &scratch; temp = *a1; a2 = c ? &X[x] : &scratch; *a2 = temp;
which seems OK to me
Yes, you are right. The problem I was thinking about is not present in the above: in the "'c' is false" case, the vectorized code for the above just wastes some effort by reading garbage from the scratchpad and writing it back to the scratchpad.
so is the scratchpad approach going away?
Not at all. :-) My example[s] was/were not written well with regard to expressing what I had in mind. The problem I was thinking about is shown with a scalar destination, e.g.: if (c) foo = X[x]; ... which is if-converted into the equivalent of: foo = c ? X[x] : foo; The [perceived/potential] problem with the preceding is that the part that is equivalent to "foo = foo;" in the above can _not_ be optimized out, as it normally would for a non-"volatile" "foo", because it is part of a larger vectorized operation and this small part cannot be broken out of the whole without breaking vectorization. Therefore, the value of "foo" might be read and then written back a few {micro|nano|pico|whatever}-seconds later, which may cause an update to the same location to be overwritten. That`s why the pathological program-under-compilation is a badly-written multithreaded program without enough locking: without the "if something, then overwrite" being replaced by "read, then if something then write the new value, and if not that same something then rewrite the old value", the badly-written multithreaded program might work correctly through "good luck", but with the replacement [i.e. the if conversion] the chances of success [where "success" here basically means not "missing" a write by another thread] is lower than it was before. However, after some discussion with Sebastian I learned that this is already taken care of, i.e. safe: the "read, then if something then write the new value, and if not that same something then rewrite the old value" replacement strategy is only used for thread-local scalars. For global scalars and static scalars, we treat the scalar as if it were the first element of a length=1 array and don`t have this problem. In other words, the problem about which I was concerned is not going to be triggered by e.g. "if (c) x = ..." which lacks an attached "else x = ..." in a multithreaded program without enough locking just because 'x' is global/static. The only remaining case to consider is if some code being compiler takes the address of something thread-local and then "gives" that pointer to another thread. Even for _that_ extreme case, Sebastian says that the gimplifier will detect this "address has been taken" situation and do the right thing such that the new if converter also does the right thing. TLDR: Abe was being too paranoid; to the best of our knowledge, it`s OK as-is. ;-) [Abe wrote:]
One of the reasons the new if converter has not yet been submitted for incorporation into GCC`s trunk is that it still has some performance regressions WRT the old converter, and most of those are "true regressions", i.e. not just because the old converter was less safe and the additional safety is what is causing the loss, but rather because there is more work to do before the patch is ready.
As of this writing, the new if converter sometimes tries to "convert" something that is already vectorizer-friendly, and in doing so it renders that code now-NOT-vectorizer-friendly.
[Alan wrote:]
Can you give an example?
The test cases in the GCC tree at "gcc.dg/vect/pr61194.c" and "gcc.dg/vect/vect-mask-load-1.c" currently test as: the new if-converter is "converting" something that`s already vectorizer-friendly, messing up the IR in the process and thus disabling vectorization for the test case in question.
My understanding was that the existing vectorizer bailed out pretty much straightaway if the number of basic blocks in the loop was not exactly 2 (for inner loops) or 5 (for outermost loops, i.e. containing exactly one inner loop)...
I see nothing in that text that I know to be wrong.
that seems to rule out vectorization of *any* kind of conditional execution, that the if-converter might convert?
If the converter takes something that`s already good [e.g. manually "converted"] and doesn`t "understand" that the code is already good, it may analyze the code in question, arrive at the conclusion "if-conversion candidate", and then "convert" it, making a mess in the process. This mess, in turn, causes the vectorizer to arrive at the conclusion "NOT a vectorization candidate" whereas without the incorrect "conversion" its conclusion would have been "vectorization candidate" and it would have vectorized it. However, TTBOMK the vectorizer already "understands" that in cases where its input looks like: x = c ? y : z; ... and 'y' and 'z' are both pure [side-effect-free] -- including, but not limited to, they must be non-"volatile" -- it may vectorize a loop containing code like the preceding, ignoring for this particular instance the C mandate that only one of {y, z} be evaluated, depending on the value of 'c', since when both are truly pure expressions it`s only at worst a waste of time/energy/both, not a semantics issue, and the vectorizer is either assuming that it`s worth it to make the transformation due to the factors-level speedup of vectorization, or it is using a cost model and arriving at the conclusion that for the loop in question, vectorization is worth it. Normally, the code would not really look all that close to the preceding, but rather more like: X[x] = c ? Y[y] : Z[z]; ... since if all the operands are scalars, it`s not very likely to be worth vectorizing, of course. The code in the loop being vectorized might even look like: X[x] = C[c] ? Y[y] : Z[z]; ... i.e. the condition could also be read from an array rather than from a scalar, regardless of whether or not that scalar [in the previous example] is a compiler-generated temporary. I hope the above is helpful. Regards, Abe