On 7/2/15 4:30 AM, Alan Lawrence wrote:

Hi, pleased to meet you :)

Likewise.  :-)


[Abe wrote:]

* Always safe for stores, sometimes a little risky for loads:
   speculative loads might cause multithreaded programs with
   insufficient locking to fail due to writes by another thread
   being "lost"/"missed", even though the same program works OK
   "by luck" when compiled without if-conversion of loads.
   This risk comes mainly/only from what the relevant literature
   calls a "half hammock": an "if" with a "then" section but no
   "else" section [or effectively vice-versa, e.g. an empty "then"
   and a non-empty "else"].  In this case, e.g. "if (c)  X[x] = Y[y];"
   with no attached "else" section is risky to fully if-convert
   in the event of the code being compiled running multithreaded
   and not having been written with all the locking it really needs.
   Respectively, e.g. "if (c)  ; /* empty ''then'' */  else  X[x] = Y[y];".


[Alan wrote:]

For the unenlightened, can you outline the problem with this code sequence?
(i.e. the expected transformation that makes it unsafe!?)
I would hope your scratchpad patch would turn this into something like

a1 = c ? &Y[y] : &scratch;
temp = *a1;
a2 = c ? &X[x] : &scratch;
*a2 = temp;

which seems OK to me

Yes, you are right.  The problem I was thinking about is not present in the 
above:
in the "'c' is false" case, the vectorized code for the above just wastes some 
effort
by reading garbage from the scratchpad and writing it back to the scratchpad.


so is the scratchpad approach going away?

Not at all.  :-)


My example[s] was/were not written well with regard to expressing what I had in 
mind.
The problem I was thinking about is shown with a scalar destination, e.g.:

  if (c)  foo = X[x];

... which is if-converted into the equivalent of:

  foo = c ? X[x] : foo;


The [perceived/potential] problem with the preceding is that the part that is equivalent 
to "foo = foo;"
in the above can _not_ be optimized out, as it normally would for a non-"volatile" 
"foo",
because it is part of a larger vectorized operation and this small part cannot 
be broken out of the whole
without breaking vectorization.  Therefore, the value of "foo" might be read 
and then written back
a few {micro|nano|pico|whatever}-seconds later, which may cause an update to 
the same location to be overwritten.
That`s why the pathological program-under-compilation is a badly-written 
multithreaded program without enough locking:
without the "if something, then overwrite" being replaced by "read, then if 
something then write the new value,
and if not that same something then rewrite the old value", the badly-written 
multithreaded program might work correctly
through "good luck", but with the replacement [i.e. the if conversion] the 
chances of success
[where "success" here basically means not "missing" a write by another thread] 
is lower than it was before.

However, after some discussion with Sebastian I learned that this is already 
taken care of, i.e. safe:
the "read, then if something then write the new value, and if not that same 
something then rewrite the old value"
replacement strategy is only used for thread-local scalars.  For global scalars 
and static scalars,
we treat the scalar as if it were the first element of a length=1 array and 
don`t have this problem.

In other words, the problem about which I was concerned is not going to be triggered by 
e.g. "if (c)  x = ..."
which lacks an attached "else  x = ..." in a multithreaded program without 
enough locking just because 'x' is global/static.

The only remaining case to consider is if some code being compiler takes the address of 
something thread-local and then "gives"
that pointer to another thread.  Even for _that_ extreme case, Sebastian says 
that the gimplifier will detect this
"address has been taken" situation and do the right thing such that the new if 
converter also does the right thing.

TLDR: Abe was being too paranoid; to the best of our knowledge, it`s OK as-is.  
;-)


[Abe wrote:]

One of the reasons the new if converter has not yet been submitted
for incorporation into GCC`s trunk is that it still has some
performance regressions WRT the old converter, and most of those
are "true regressions", i.e. not just because the old converter
was less safe and the additional safety is what is causing the loss,
but rather because there is more work to do before the patch is ready.

As of this writing, the new if converter sometimes tries
to "convert" something that is already vectorizer-friendly,
and in doing so it renders that code now-NOT-vectorizer-friendly.


[Alan wrote:]

Can you give an example?

The test cases in the GCC tree at "gcc.dg/vect/pr61194.c" and 
"gcc.dg/vect/vect-mask-load-1.c"
currently test as: the new if-converter is "converting" something that`s 
already vectorizer-friendly,
messing up the IR in the process and thus disabling vectorization for the test 
case in question.


My understanding was that the existing vectorizer bailed out pretty much 
straightaway
if the number of basic blocks in the loop was not exactly 2 (for inner loops) 
or 5
(for outermost loops, i.e. containing exactly one inner loop)...

I see nothing in that text that I know to be wrong.


that seems to rule out vectorization of *any* kind of
conditional execution, that the if-converter might convert?

If the converter takes something that`s already good [e.g. manually "converted"]
and doesn`t "understand" that the code is already good, it may analyze the code 
in question,
arrive at the conclusion "if-conversion candidate", and then "convert" it, 
making a mess in the process.
This mess, in turn, causes the vectorizer to arrive at the conclusion "NOT a 
vectorization candidate"
whereas without the incorrect "conversion" its conclusion would have been
"vectorization candidate" and it would have vectorized it.

However, TTBOMK the vectorizer already "understands" that in cases where its 
input looks like:

  x = c ? y : z;

... and 'y' and 'z' are both pure [side-effect-free] -- including, but not limited to, 
they must be non-"volatile" --
it may vectorize a loop containing code like the preceding, ignoring for this 
particular instance the C mandate
that only one of {y, z} be evaluated, depending on the value of 'c', since when 
both are truly pure expressions it`s only
at worst a waste of time/energy/both, not a semantics issue, and the vectorizer 
is either assuming that it`s worth it
to make the transformation due to the factors-level speedup of vectorization, 
or it is using a cost model
and arriving at the conclusion that for the loop in question, vectorization is 
worth it.

Normally, the code would not really look all that close to the preceding, but 
rather more like:

  X[x] = c ? Y[y] : Z[z];

... since if all the operands are scalars, it`s not very likely to be worth 
vectorizing, of course.

The code in the loop being vectorized might even look like:

  X[x] = C[c] ? Y[y] : Z[z];

... i.e. the condition could also be read from an array rather than from a 
scalar,
regardless of whether or not that scalar [in the previous example] is a 
compiler-generated temporary.


I hope the above is helpful.

Regards,

Abe

Reply via email to