[PATCH] fix PR46029: reimplement if conversion of loads and stores [3nd submitted version of patch]

2015-07-09 Thread Abe

Below, please find the 3nd submitted version of this patch, now with some more 
issues resolved.

Regards,

Abe





From 87af575347e216672e322bbc1b4ae0a5ab93507f Mon Sep 17 00:00:00 2001
From: Abe 
Date: Mon, 18 May 2015 14:26:29 -0500
Subject: [PATCH] fix PR46029: reimplement if conversion of loads and stores

New relative to previous version of same patch:
  * addressed comments/suggestions from Alan and Richard.
  * disabled the test cases that are currently not if-converted correctly, 
pending improvements to the new if-converter

Passed regression testing and bootstrap on AMD64 GNU/Linux [AKA 
"x86_64-unknown-linux-gnu"].

2015-06-12  Sebastian Pop  
        Abe Skolnik  

PR tree-optimization/46029
* tree-data-ref.c (struct data_ref_loc_d): Moved...
(get_references_in_stmt): Exported.
* tree-data-ref.h (struct data_ref_loc_d): ... here.
(get_references_in_stmt): Declared.

* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
* tree-if-conv.c (struct ifc_dr): Removed.
(IFC_DR): Removed.
(DR_WRITTEN_AT_LEAST_ONCE): Removed.
(DR_RW_UNCONDITIONALLY): Removed.
(memrefs_read_or_written_unconditionally): Removed.
(write_memrefs_written_at_least_once): Removed.
(ifcvt_could_trap_p): Does not take refs parameter anymore.
(ifcvt_memrefs_wont_trap): Removed.
(has_non_addressable_refs): New.
(if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
Removed use of refs.
(if_convertible_stmt_p): Removed use of refs.
(if_convertible_gimple_assign_stmt_p): Same.
(if_convertible_loop_p_1): Removed use of refs.  Remove initialization
of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
(insert_address_of): New.
(create_scratchpad): New.
(create_indirect_cond_expr): New.
(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
parameter for scratch_pad.
(combine_blocks): Same.
(tree_if_conversion): Same.

testsuite/
* g++.dg/tree-ssa/ifc-pr46029.C: New.
* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
* gcc.dg/tree-ssa/ifc-8.c: New.
* gcc.dg/tree-ssa/ifc-9.c: New.
* gcc.dg/tree-ssa/ifc-10.c: New.
* gcc.dg/tree-ssa/ifc-11.c: New.
* gcc.dg/tree-ssa/ifc-12.c: New.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New.
* gcc.dg/vect/pr61194.c: Disabled.
* gcc.dg/vect/vect-mask-load-1.c: Disabled.
* gcc.dg/vect/vect-mask-loadstore-1.c: Disabled.
* gcc.target/i386/avx2-gather-6.c: Disabled.
* gcc.target/i386/avx2-vect-aggressive-1.c: Disabled.
* gcc.target/i386/avx2-vect-aggressive.c: Disabled.
---
 gcc/ChangeLog  |  28 ++
 gcc/common.opt |   2 +-
 gcc/doc/invoke.texi|  29 +-
 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C|  76 
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c |  17 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c |  16 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c |  13 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c  |  19 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c  |  29 ++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c  |  17 +
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c  |  10 +-
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c  |  46 +++
 gcc/testsuite/gcc.dg/vect/pr61194.c|   4 +-
 gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c   |   4 +-
 gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c  |   4 +-
 gcc/testsuite/gcc.target/i386/avx2-gather-6.c  |   4 +-
 .../gcc.target/i386/avx2-vect-aggressive-1.c   |   4 +-
 .../gcc.target/i386/avx2-vect-aggressive.c |   4 +-
 gcc/tree-data-ref.c|  13 +-
 gcc/tree-data-ref.h|  14 +
 gcc/tree-if-conv.c | 417 +
 gcc/tree-vect-stmts.c  |   2 +-
 23 files changed, 501 insertions(+), 273 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3dec6b1..70af07c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2015-05-18  Sebastian Pop  
+
+ 

revised and updated new-if-converter patch… [PATCH] fix PR46029: reimplement if conversion of loads and stores

2015-07-17 Thread Abe

Dear all,

Relative to the previous submission of this same patch, the below corrects some 
minor spacing and/or indentation issues,
misc. other formatting fixes, and makes the disabled vectorization tests be disabled via 
"xfail" rather than by adding spaces to
deliberately cause the relevant scanned-for text to not be found by DejaGNU so 
as to prevent the DejaGNU line from being interpreted.

The below is also based on a Git checkout that was rebased to the latest 
upstream check-in from today,
so it should merge cleanly with trunk as of today.

Regards,

Abe








From 89e115118dcc49d6839db2e9d7ae6c330789c235 Mon Sep 17 00:00:00 2001
Subject: [PATCH] fix PR46029: reimplement if conversion of loads and stores

In the current implementation of if conversion, loads and stores are
if-converted in a thread-unsafe way:

  * loads were always executed, even when they should have not been.
Some source code could be rendered invalid due to null pointers
that were OK in the original program because they were never
dereferenced.

  * writes were if-converted via load/maybe-modify/store, which
renders some code multithreading-unsafe.

This patch reimplements if-conversion of loads and stores in a safe
way using a scratchpad allocated by the compiler on the stack:

  * loads are done through an indirection, reading either the correct
data from the correct source [if the condition is true] or reading
from the scratchpad and later ignoring this read result [if the
condition is false].

  * writes are also done through an indirection, writing either to the
correct destination [if the condition is true] or to the
scratchpad [if the condition is false].

Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the
old if-conversion resulted in unsafe code that could fail under
multithreading even though the as-written code _was_ thread-safe.

Passed regression testing and bootstrap on amd64-linux.

2015-06-12  Sebastian Pop  
Abe Skolnik  

PR tree-optimization/46029
* tree-data-ref.c (struct data_ref_loc_d): Moved...
(get_references_in_stmt): Exported.
* tree-data-ref.h (struct data_ref_loc_d): ... here.
(get_references_in_stmt): Declared.

* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
* tree-if-conv.c (struct ifc_dr): Removed.
(IFC_DR): Removed.
(DR_WRITTEN_AT_LEAST_ONCE): Removed.
(DR_RW_UNCONDITIONALLY): Removed.
(memrefs_read_or_written_unconditionally): Removed.
(write_memrefs_written_at_least_once): Removed.
(ifcvt_could_trap_p): Does not take refs parameter anymore.
(ifcvt_memrefs_wont_trap): Removed.
(has_non_addressable_refs): New.
(if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
Removed use of refs.
(if_convertible_stmt_p): Removed use of refs.
(if_convertible_gimple_assign_stmt_p): Same.
(if_convertible_loop_p_1): Removed use of refs.  Remove initialization
of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
(insert_address_of): New.
(create_scratchpad): New.
(create_indirect_cond_expr): New.
(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
parameter for scratch_pad.
(combine_blocks): Same.
(tree_if_conversion): Same.

testsuite/
* g++.dg/tree-ssa/ifc-pr46029.C: New.
* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
* gcc.dg/tree-ssa/ifc-8.c: New.
* gcc.dg/tree-ssa/ifc-9.c: New.
* gcc.dg/tree-ssa/ifc-10.c: New.
* gcc.dg/tree-ssa/ifc-11.c: New.
* gcc.dg/tree-ssa/ifc-12.c: New.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New.
---
 gcc/common.opt |   2 +-
 gcc/doc/invoke.texi|  29 +-
 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C|  76 
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c |  17 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c |  16 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c |  13 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c  |  19 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c  |  29 ++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c  |  17 +
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c  |  10 +-
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c  |  46 +++
 gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c  |   1 +
 gcc/testsuite/gcc.target/i386/avx2-gather-6.c  |   2 +-
 .../gcc.target/i386/avx2-vect-aggressive-1.c   |   2 +-
 .../gcc.target/i386/avx2-vect-aggressive.c |   3 +-
 gcc/tree-data-ref.c|  13 +-
 gcc/tree-data-ref.h  

Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by the o

2015-07-17 Thread Abe
nverts something like (same example as one of the above):

  if (C[index])  A[index] = foo(bar);
  // important: no else here

... into something like:

  /* the type of the scalar goes here */ * pointer = C[index] ? &A[index] : 
&scratchpad;
  __compiler_temp = foo(bar);
  *pointer = C[index] ? __compiler_temp : scratchpad;

... so that in the event of C[] being mostly false, most of the reads are 
_from_ the scratchpad
and most of the writes are _to_ the scratchpad.  This not only probably* saves 
a lot of the
potentially-massive number of wasted writes due to the old conversion rewriting 
all of A[]
no matter what the values of C[]`s elements are, it also saves a lot of reads 
when there is
enough clustering of false values in C[] that entire cache lines worth of A[] 
can be not
read from main RAM.  Caching and the "chunking effect" of cache lines, besides 
qualifying how
much reading is saved by the new conversion relative to the old one, also is 
the reason I needed
to qualify "probably* saves most/all the [...] wasted writes": it depends on 
how well or
how poorly the values in C[] are clustered.  In the case that is best for the 
new converter,
the new converter saves at least almost all the waste of the old converter`s 
code on both
the read path _and_ the write path [modulo at most one cache line of both read 
and write].


I hope the above helps to explain why the new converter is worth merging to 
trunk,
even though for now it still has performance regressions which
we expect to fix before the first GCC 6.x is tagged for release.


Regards,

Abe


Re: revised and updated new-if-converter patch… [PATCH] fix PR46029: reimplement if conversion of loads and stores

2015-07-20 Thread Abe

[Alan wrote:]
> Would having a testsuite predicate for the target supporting gathered loads
> let you run this test on those architectures? I'd expect one to be useful
> in a few other places too, in time if it doesn't exist already...

Thanks, but I don`t think that would fully fix all the regressions about which 
I am currently concerned.
I`m sure it could be useful to solve other problems, but as for the problems I 
am currently working on,
even though the above could easily provide a "way out", this is not the way in 
which I want to fix
the problem.  I`d much rather fix/improve the new if converter so that it 
doesn`t require
the target architecture to be able to gather/scatter when there is no 
fundamental need to do so,
e.g. when the vectorizer is complaining with messages that include the word 
"gather" only because
the new if converter is being "wasteful" with indirections that are not truly 
necessary and which
the vectorizer cannot tolerate [and for which there is no intermediate pass 
that eliminates them].

In other words, the new if converter IMO needs to be more careful, more 
analytical, and more
parsimonious with its use of indirection than it is right now.  It`s already a 
lot better than
the old converter, but it`s not perfect yet.  IMO it`s good enough to push to 
trunk already
largely b/c we have a lot of time IMO-and-AFAIK before 6.x is tagged for 
release.  The sooner
this gets pushed to trunk, the sooner we`ll have more "eyeballs" on this code 
and more people
might have good ideas for how to fix the regressions and otherwise improve the 
new code.

One of the best/most-important things I learned during my undergraduate 
Computer Science
education is "all problems in CS can be solved by adding another layer/level of 
indirection,
_except_ for the problem “I/we/the-program have/has too many layers/levels of 
indirection”".  ;-)

Regards,

Abe


Re: Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by t

2015-07-28 Thread Abe
C[index] being convertible to true is very low,
   _and_ the statically-estimated evaluation cost of "exp(x)"
   is both under the maximum and too close to that maximum}.
Arguably, barring bugs in the cost estimation,
if this happens too often in real-world code,
then we have set the maximum too high and should adjust it.



Dependent on whether the scratchpad is shared for different sized accesses
or not the scratchpad may also introduce store hazards in the CPU
because if scratchpad-as-int = ...; follows scratchpad-as-short = ...;
the latter only partially kills the preceeding store to the scratchpad
(usually CPUs can't merge the two requests in the store buffer).


It seems to me that the worst that can happen as a result of the preceding
is that non-{in-order} CPUs will be somewhat restricted in how much they
can re-order some of the internal operations that result from the decoding
of the machine-code instructions.  This seems to me to not be a terribly big
problem since the code in question should be in a vectorized loop anyway.
If the preceding causes a long stall on a large # of CPUs, then we
should do something about it, of course.  Otherwise, I propose that
the preceding may be a largely-theoretical performance problem
the solving of which doesn`t need to be a high priority
until/unless it is proven to be a very real-world problem.



Not sure which approach to allocating the scratchpad you are using currently.


We are currently using something at least "morally equivalent" to a single:

  char scratchpad[64];

... for each routine with a conversion that triggers scratchpad generation.



I still have to review the patch itself ... (and so many others)


The work of a mother or a GCC maintainer is never done.  ;-)



How do you expect to fix the performance regressions?


Mostly by the fruit of my labor with help from Sebastian.  I`d _really_ like
to get the new converter into trunk ASAP so that there will be many more
"eyes" on the code.  Also, it is important IMO that this code should not
languish for months/years [again], since that is what happened approx. 5 years
ago when Sebastian wrote it in the first place.  I don`t think a branch is
the right place for this code, since the branch would allow the code to die.

Regards,

Abe


Re: Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by t

2015-07-29 Thread Abe

Well.  We don't generally introduce regressions with changes.


Understood.  Regressions are bad, of course.  TTBOMK the
regressions in question are temporary.  Once they are gone,
I think we can then look at whether or not we still
need to keep the old if converter in trunk.  Ideally,
it eventually becomes redundant and unneeded.



(well, the patch still needs review -

> I hope to get to that this week).

After I`ve done the SPEC-based analysis, my next planned steps
on this work are to disable the code that [in my WIP] currently
causes conversion to be enabled by default when autovectorization
is enabled, then to re-integrate the old converter and implement
the switches that will give GCC users access to the modes I described
in a recent email from me.  You might prefer to delay your code review
until I have that all done and a new version of the patch submitted.

Regards,

Abe


Re: Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by t

2015-07-31 Thread Abe

[Abe wrote:]

After I`ve done the SPEC-based analysis, my next planned steps
on this work are to disable the code that [in my WIP] currently
causes conversion to be enabled by default when autovectorization
is enabled, then to re-integrate the old converter and implement
the switches that will give GCC users access to the modes I described
in a recent email from me.  You might prefer to delay your code review
until I have that all done and a new version of the patch submitted.


[Richard wrote:]

I'm not sure we want two if-converters.


Not _permanently_, no.  However, having the old one available and be
the one that is activated -- [1] by default under conditions in which
current GCC trunk activates if conversion by default, and [2] when the
existing/old flags call for it -- allows us to have the old converter
still apply in all cases in which it currently applies, to have the
new converter in trunk so it has eyes on it and Sebastian and myself
won`t be the only people working on it anymore [I hope ;-)].

The preceding will also make comparison of the two converters easier
[use the same compiler build, just vary your flags] and more reliable
[all other compiler components will be identical, so we have
 a stronger guarantee of apples-to-apples comparison].  All of that
will help us make the new converter better and better until the old
one becomes completely unneeded, at which point we can remove it
and "massage" the user-visible flags and the semantics thereof --
ideally only doing this in a release just before the release of
a new GCC major version number so we don`t introduce breaking
changes in the middle of a major version number, of course.
[not pretending to "teach" anybody anything here --
 just showing that I, too, understand the basics of releasing
 decent software -- or at least _some_ of those basics  ;-)]



What we do want is avoid using a scratch-pad

> if it is safe to do (for loads and stores)

I strongly agree.  In fact, TTBOMK that is a big part of improving
the new converter so it no longer has performance regressions.



and if the user tells us he is fine with store data races (for stores).


Wouldn`t it be nice -- if we can do so without killing ourselves over it --
to only take the flag in question as _permission_ to generate a data race?
IOW, it would be nice if a cost-model-based analysis would be able to tell
us e.g. "for this loop, it`s worth it due to the many saved machine cycles"
versus "for this loop, it`s not worth it: you only stand to save a cycle
or two, and you potentially sacrifice some correctness in the process".



Does the "new" if-converter get rid of the analysis code that
determined "safe"?  If so you should re-instantiate that.


I don`t have a good answer for that right now and don`t anticipate
having the correct answer soon enough that I should delay
this reply IMO.  I`ll get back to you on this question.

Regards,

Abe


Re: Minor typo fixes

2015-08-05 Thread Abe

Thank you, sir.  :-)

Regards,

Abe




On 8/3/15 10:53 PM, Jeff Law wrote:


I was starting to look at Abe's changes to the gimple if-converter and realized 
 a handful
of the changes were just fixing comments.  No reason those shouldn't go in 
immediately.
So I pulled them out and applied those changes to the trunk.



Abe -- if you find more of those kind of changes, don't hesitate  to break
them out into their own patch and they can go forward very quickly.



Attached are the fixes that were actually applied.  They were bootstrapped for 
completeness.


[PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]

2015-06-30 Thread Abe

New relative to previous version of same patch:
  * addressed comments/suggestions from Alan and Richard.
  * disabled the test cases that are currently not if-converted correctly, 
pending improvements to the new if-converter

Passed regression testing and bootstrap on amd64-linux.

2015-06-12  Sebastian Pop  
Abe Skolnik  

PR tree-optimization/46029
* tree-data-ref.c (struct data_ref_loc_d): Moved...
(get_references_in_stmt): Exported.
* tree-data-ref.h (struct data_ref_loc_d): ... here.
(get_references_in_stmt): Declared.

* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
* tree-if-conv.c (struct ifc_dr): Removed.
(IFC_DR): Removed.
(DR_WRITTEN_AT_LEAST_ONCE): Removed.
(DR_RW_UNCONDITIONALLY): Removed.
(memrefs_read_or_written_unconditionally): Removed.
(write_memrefs_written_at_least_once): Removed.
(ifcvt_could_trap_p): Does not take refs parameter anymore.
(ifcvt_memrefs_wont_trap): Removed.
(has_non_addressable_refs): New.
(if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
Removed use of refs.
(if_convertible_stmt_p): Removed use of refs.
(if_convertible_gimple_assign_stmt_p): Same.
(if_convertible_loop_p_1): Removed use of refs.  Remove initialization
of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
(insert_address_of): New.
(create_scratchpad): New.
(create_indirect_cond_expr): New.
(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
parameter for scratch_pad.
(combine_blocks): Same.
(tree_if_conversion): Same.

testsuite/
* g++.dg/tree-ssa/ifc-pr46029.C: New.
* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
* gcc.dg/tree-ssa/ifc-8.c: New.
* gcc.dg/tree-ssa/ifc-9.c: New.
* gcc.dg/tree-ssa/ifc-10.c: New.
* gcc.dg/tree-ssa/ifc-11.c: New.
* gcc.dg/tree-ssa/ifc-12.c: New.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New.
* gcc.dg/vect/pr61194.c: Disabled.
* gcc.dg/vect/vect-mask-load-1.c: Disabled.
* gcc.dg/vect/vect-mask-loadstore-1.c: Disabled.
* gcc.target/i386/avx2-gather-6.c: Disabled.
* gcc.target/i386/avx2-vect-aggressive-1.c: Disabled.
* gcc.target/i386/avx2-vect-aggressive.c: Disabled.
---
 gcc/ChangeLog  |  28 ++
 gcc/common.opt |   2 +-
 gcc/doc/invoke.texi|  18 +-
 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C|  76 
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c |  17 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c |  16 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c |  13 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c  |  19 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c  |  29 ++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c  |  17 +
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c  |  10 +-
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c  |  46 +++
 gcc/testsuite/gcc.dg/vect/pr61194.c|   4 +-
 gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c   |   4 +-
 gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c  |   4 +-
 gcc/testsuite/gcc.target/i386/avx2-gather-6.c  |   4 +-
 .../gcc.target/i386/avx2-vect-aggressive-1.c   |   4 +-
 .../gcc.target/i386/avx2-vect-aggressive.c |   4 +-
 gcc/tree-data-ref.c|  13 +-
 gcc/tree-data-ref.h|  14 +
 gcc/tree-if-conv.c | 416 ++---
 22 files changed, 500 insertions(+), 260 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3dec6b1..70af07c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2015-05-18  Sebastian Pop  
+
+   PR tree-optimization/46029
+   * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
+   * tree-if-conv.c (has_unaligned_memory_refs): New.
+   (if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
+   (create_scratchpad): New.
+   (create_indirect_cond_expr): New.
+   (predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
+   parameter 

Re: [PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]

2015-07-02 Thread Abe

On 7/2/15 4:49 AM, Alan Lawrence wrote:


Thanks, Abe.


You are welcome, sir!  :-)



As before, I'm still confused here. This still returns false, i.e. bails out of
if-conversion, if the statement could trap. Doesn't the scratchpad let us handle
that? Or do we just not care because it won't be vectorizable anyway???


This seems like an opportunity for more optimization in the future.  However, 
at this time we do not see what kind of code would benefit from this 
optimization.
If you have some time to spare and wish to spend some of it on this issue, then 
please find/write a test case that would exercise this path, i.e. a snippet of 
code
that is not optimized due to the above concern, even though it _could_ be 
if-converted -- and, preferably, the resulting conversion _is_ 
vectorizer-friendly.

Of course, even if a test case can be written to trigger this missed 
opportunity,
that in and of itself does not yet tell us how much opportunity we are missing 
in _real-world_ code.



Nit: as before [...]


Thanks for the reminder[s].



Nit: as before - thanks for fixing the example here


You are welcome.



Where can I find info on what the different flag values mean?

> (I had thought they were booleans [...]

Sorry; I don`t know if that is documented anywhere yet.

In this case, (-1) simply means "defaulted": on if the vectorizer is on, and 
off if it is off.
(0) means "user specified no if conversion" and (1) means "user specified [yes] if 
conversion".

Regards,

Abe


Re: [PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]

2015-07-06 Thread Abe

[Abe wrote:]

This seems like an opportunity for more optimization in the future


[On 7/6/15 10:09 AM, Alan Lawrence wrote:]

we get enough benefit from the patch, even without my suggested extra change. 
Ok, fair enough! Thanks for the clarification.


You are welcome, sir.


[Alan wrote:]

Where can I find info on what the different flag values mean?
(I had thought they were booleans [...]


[Abe wrote:]

Sorry; I don`t know if that is documented anywhere yet.
In this case, (-1) simply means "defaulted": on if the vectorizer is on, and 
off if it is off.
(0) means "user specified no if conversion" and (1) means "user specified [yes] if 
conversion".


[Alan wrote:]

Ah, right, that makes sense now. Obviously I would like to see this written in 
doc/ .


Please consider it added to the work queue.  ;-)

Regards,

Abe



RE: fix PR46029: reimplement if conversion of loads and stores

2015-07-07 Thread Abe

(if-conversion could directly generate masked load/stores
 of course and not use a scratch-pad at all in that case).


IMO that`s a great idea, but I don`t know how to do it.  Hints would be welcome.  In 
particular, how does one "generate masked load/stores" at the GIMPLE level?



But are we correctly handling these cases in the current if conversion code?


I`m uncertain to what that is intended to refer, but I believe Sebastian would 
agree that the new if converter is safer than the old one in terms of 
correctness at the time of running the code being compiled.



Abe's changes would seem like a step forward from a correctness standpoint


Not to argue, but as a point of humility: Sebastian did by far the most work on 
this patch.  I just modernized it and helped move it along.  Even _that_ was 
done with Sebastian`s help.



even if they take us a step backwards from a performance standpoint.


For now, we have a few performance regressions, and so far we have found that 
it`s non-trivial to remove all of those regressions.
We may be better off pushing the current patch to trunk and having the 
performance regressions currently introduced by the new if converter be fixed 
by later patches.
Pushing to trunk gives us excellent "visibility" amongst GCC hackers, so the code will 
get more "eyeballs" than if it lingers in an uncommitted patch or in a branch.
I, for one, would love some help in fixing these performance regressions. ;-)

If fixing the performance regressions winds up taking too long, perhaps the 
current imperfect patch could be undone on trunk just before a release is 
tagged,
and then we`ll push it in again when trunk goes back to being allowed to be 
unstable?  According to my analysis of the data near the end of the page at
<https://gcc.gnu.org/develop.html>, we have until roughly April of 2016 to work 
on not-yet-perfect patches in trunk.



So the question is whether we get more non-vectorized if-converted
code out of this (and thus whether we want to use --param
allow-store-data-races to get the old code back which is nicer to less
capable CPUs and probably faster than using scatter/gather or masked 
loads/stores).



I do think conditionalizing some of this on the allow-store-data-races makes 
sense.


I think having both the old if-converter and the new one "live on" in GCC is 
nontrivial, but not impossible.  I also don`t think it`s the best long-term goal,
but only a short-term workaround.  In the long run, IMO there should be only one if 
converter, albeit perhaps with tweaking flags [e.g. 
"-fallow-unsafe-if-conversion"].



I also wonder if we should really care about load data races (not sure your 
patch does).


According to a recent long discussion I had with Sebastian, our current patch 
does not have the flaw I was concerned it might have in terms of loads because:

  [1] the scratchpad is only being used to if-convert assignments to 
thread-local scalars, never to globals/statics, and because...

  [2] the gimplifier is supposed to detect "the address of this scalar has been 
taken" and when such is detected in the code being compiled,
  it causes the scalar to no longer look like a scalar in GIMPLE so that we 
are also safe from stale-data problems that could come from
  corner-case code that takes the address of a thread-local variable and 
gives that address to another thread [which then proceeds to
  overwrite the value in the supposedly-thread-local scalar that belongs to 
a different thread from the one doing the writing]


Regards,

Abe




Re: [PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]

2015-07-08 Thread Abe

[Alan wrote:]


Where can I find info on what the different flag values mean?
(I had thought they were booleans [...]



[Abe wrote:]


Sorry; I don`t know if that is documented anywhere yet.



In this case, (-1) simply means "defaulted": on if the vectorizer is on, and 
off if it is off.
(0) means "user specified no if conversion" and (1) means "user specified [yes] if 
conversion".



[Alan wrote:]


Ah, right, that makes sense now. Obviously I would like to see this written in 
doc/ .


Is the following sufficient?  From "doc/invoke.texi", I propose to replace:

  This is enabled by default if vectorization is enabled.

... with:

  This is enabled by default when vectorization is  enabled
  anddisabled by default when vectorization is disabled.


Regards,

Abe


Re: fix PR46029: reimplement if conversion of loads and stores

2015-07-08 Thread Abe

[Abe wrote:]

I believe Sebastian would agree that the new if converter  is safer than the 
old one

>> in terms of correctness at the time of running the code being compiled.
[...]

For now, we have a few performance regressions

[snip]


[Alan wrote:]

TBH my two cents would be that a performance-regressed, but correct, compiler, 
is far better to release,  than a

> performance-"improved" one that generates unsafe code (e.g. with extra faults 
in the straightforward single-threaded case!)...

I strongly agree that -- by default -- correctness trumps performance.  The 
only times it is allowable to reverse that relationship, IMO,
is when the user* of the compiler has explicitly specified flags [e.g. "-ffast-math", 
"-Ofast"] that tell the compiler that the user*
currently cares more about performance than about 
{correctness-according-to-spec and/or safety in all conditions including null 
pointers}.

['*': or Makefiles, build scripts, etc.]

FYI: TTBOMK, the old if converter was not unsafe with default flags or with only "big 
knobs" like "-O3"; I`m unsure what it did
under "-ffast-math" and "-Ofast", if anything of interest.  The main advantage 
of the new if converter over the old one is that
the new one is safe in certain situations wherein the old one is unsafe, e.g. 
the old one may cause the vectorized code to segfault
where the non-if-converted code would have run just fine all the way to program 
completion with the same inputs.  This additional
safety allows the new converter to be used under more conditions, which in turn 
allows it to be enabled by default.  We intend
for all the safe if-conversions to be done by default whenever the vectorizer 
is on.  If there are any unsafe conversions left,
which I`m not sure there are, then we will enable them only when the user* specifies 
something like "-fif-conversion=allow-unsafe".
The "allows it to be enabled by default" property should help the code that GCC generates 
under "-O3" w/o any additional
flags to be faster than it currently is, for the relevant targets, *without 
sacrificing even _one_ _bit_ of correctness*.

Regards,

Abe


Re: fix PR46029: reimplement if conversion of loads and stores

2015-07-08 Thread Abe

(if-conversion could directly generate masked load/stores
  of course and not use a scratch-pad at all in that case).


[Abe wrote:]

IMO that`s a great idea, but I don`t know how to do it.
Hints would be welcome.  In particular, how does one

>> "generate masked load/stores" at the GIMPLE level?

[Richard Biener wrote:]

It already does that, see predicate_mem_writes.

> You should definitely preserve that path

Thanks.  Yes, we have not intentionally disabled that.



On what hardware did you test?


AMD64 arch., Intel implementation.  Nothing fancy AFAIK in the flags to make it 
super-specific,
e.g. "-march=nocona" or "-march=native".  Occasionally using AVX-2 flags as 
specified by some test cases.

Regards,

Abe


Re: [PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]

2015-07-08 Thread Abe

[Abe wrote:]

Is the following sufficient?  From "doc/invoke.texi", I propose to replace:



   This is enabled by default if vectorization is enabled.



... with:



   This is enabled by default when vectorization is  enabled
   anddisabled by default when vectorization is disabled.




[Alan wrote:]


That would be helpful - but has the syntax not changed?


AFAIK & IIRC, not yet.  We plan to change the flags later.


[Alan wrote:]

I was thinking it'd be useful to document somewher the meanings of 1/0/-1, as 
you explained them to me:

[snip]

Perhaps in the gate() function if not in the doc.


OK.  It seemed like the doc files was not the right place to document internal 
values that flags influence.

Regards,

Abe



[PATCH] vax: resolve long-standing documentation bugs re floating-point codegen [PR79646]

2024-04-26 Thread Abe Skolnik
Howdy, y`all.

After many years away from contributing to GCC, I am rejoining as a "gardener": 
my intent/plan is to clean up the bug backlog, weeding out old bugs that are 
relatively-easy to fix yet have languished for a long time.  First for today`s 
gardening: a documentation-only bug or two related to how floating-point code 
is generated for VAX-compatible CPUs.  Yes, _VAX_.

I made sure to update the translation files, using an on-line translator for 
languages which I can`t really read/write on my own.  In at least one case 
where the translation was empty before, I left it empty.  In at least one case 
where the translation was empty before, I filled it in.

Copyright assignment to the FSF, yadda yadda yadda...  all the usual.

Sincerely,

Abe
From ec5f259d0e7dd7dcd1194f775bf00d3decb786f3 Mon Sep 17 00:00:00 2001
From: Abe 
Date: Wed, 24 Apr 2024 21:06:50 -0400
Subject: [PATCH] PR79646: corrected VAX-specific message strings related to
 double-precision codegen, updated+improved translations of those messages.

PR target/79646 - Typos in vax.opt

	PR target/79646

gcc/ChangeLog:

	* config/vax/vax.opt:

gcc/po/ChangeLog:

	* be.po:
	* da.po:
	* de.po:
	* el.po:
	* es.po:
	* fi.po:
	* fr.po:
	* hr.po:
	* id.po:
	* ja.po:
	* nl.po:
	* ru.po:
	* sr.po:
	* sv.po:
	* tr.po:
	* uk.po:
	* vi.po:
	* zh_CN.po:
	* zh_TW.po:

---
 gcc/config/vax/vax.opt |  8 
 gcc/po/be.po   |  8 
 gcc/po/da.po   | 10 --
 gcc/po/de.po   |  8 
 gcc/po/el.po   |  8 
 gcc/po/es.po   |  8 
 gcc/po/fi.po   |  8 
 gcc/po/fr.po   |  8 
 gcc/po/hr.po   |  4 ++--
 gcc/po/id.po   |  8 
 gcc/po/ja.po   |  8 
 gcc/po/nl.po   |  4 ++--
 gcc/po/ru.po   |  8 
 gcc/po/sr.po   | 10 --
 gcc/po/sv.po   |  6 +++---
 gcc/po/tr.po   | 10 --
 gcc/po/uk.po   |  4 ++--
 gcc/po/vi.po   |  8 
 gcc/po/zh_CN.po| 10 --
 gcc/po/zh_TW.po| 10 --
 20 files changed, 73 insertions(+), 83 deletions(-)

diff --git a/gcc/config/vax/vax.opt b/gcc/config/vax/vax.opt
index 2cc66e543fe..fa2be78e9fa 100644
--- a/gcc/config/vax/vax.opt
+++ b/gcc/config/vax/vax.opt
@@ -20,19 +20,19 @@
 
 md
 Target RejectNegative InverseMask(G_FLOAT)
-Target DFLOAT double precision code.
+Generate DFLOAT double-precision code.
 
 md-float
 Target RejectNegative InverseMask(G_FLOAT)
-Target DFLOAT double precision code.
+Generate DFLOAT double-precision code.
 
 mg
 Target RejectNegative Mask(G_FLOAT)
-Generate GFLOAT double precision code.
+Generate GFLOAT double-precision code.
 
 mg-float
 Target RejectNegative Mask(G_FLOAT)
-Generate GFLOAT double precision code.
+Generate GFLOAT double-precision code.
 
 mgnu
 Target RejectNegative InverseMask(UNIX_ASM)
diff --git a/gcc/po/be.po b/gcc/po/be.po
index ae92f9b7b96..db102673665 100644
--- a/gcc/po/be.po
+++ b/gcc/po/be.po
@@ -11502,13 +11502,13 @@ msgstr ""
 
 #: config/vax/vax.opt:23 config/vax/vax.opt:27
 #, no-c-format
-msgid "Target DFLOAT double precision code."
-msgstr ""
+msgid "Generate DFLOAT double-precision code."
+msgstr "Генерыраваць код double-precision для DFLOAT."
 
 #: config/vax/vax.opt:31 config/vax/vax.opt:35
 #, no-c-format
-msgid "Generate GFLOAT double precision code."
-msgstr ""
+msgid "Generate GFLOAT double-precision code."
+msgstr "Генерыраваць код double-precision для GFLOAT."
 
 #: config/vax/vax.opt:39
 #, fuzzy, no-c-format
diff --git a/gcc/po/da.po b/gcc/po/da.po
index 9871ec5b82b..7d82a108897 100644
--- a/gcc/po/da.po
+++ b/gcc/po/da.po
@@ -12372,15 +12372,13 @@ msgstr ""
 
 #: config/vax/vax.opt:23 config/vax/vax.opt:27
 #, fuzzy, no-c-format
-#| msgid "Generate GFLOAT double precision code"
-msgid "Target DFLOAT double precision code."
-msgstr "Opret GFLOAT dobbeltpræcision kode"
+msgid "Generate DFLOAT double-precision code."
+msgstr "Opret GFLOAT dobbeltpræcision kode."
 
 #: config/vax/vax.opt:31 config/vax/vax.opt:35
 #, fuzzy, no-c-format
-#| msgid "Generate GFLOAT double precision code"
-msgid "Generate GFLOAT double precision code."
-msgstr "Opret GFLOAT dobbeltpræcision kode"
+msgid "Generate GFLOAT double-precision code."
+msgstr "Opret GFLOAT dobbeltpræcision kode."
 
 #: config/vax/vax.opt:39
 #, fuzzy, no-c-format
diff --git a/gcc/po/de.po b/gcc/po/de.po
index db42d7191ff..1303b189a4b 100644
--- a/gcc/po/de.po
+++ b/gcc/po/de.po
@@ -11333,13 +11333,13 @@ msgstr "Daten werden ausgehend vom Start des Programmcodes statt der GOT adressi
 
 #: config/vax/vax.opt:23 config/vax/vax.opt:27
 #, no-c-format
-msgid "Target DFLOAT double precision code."
-msgstr &q

fix PR46029: reimplement if conversion of loads and stores

2015-06-12 Thread Abe Skolnik
Hi everybody!

In the current implementation of if conversion, loads and stores are
if-converted in a thread-unsafe way:

  * loads were always executed, even when they should have not been.
Some source code could be rendered invalid due to null pointers
that were OK in the original program because they were never
dereferenced.

  * writes were if-converted via load/maybe-modify/store, which
renders some code multithreading-unsafe.

This patch reimplements if-conversion of loads and stores in a safe
way using a scratchpad allocated by the compiler on the stack:

  * loads are done through an indirection, reading either the correct
data from the correct source [if the condition is true] or reading
from the scratchpad and later ignoring this read result [if the
condition is false].

  * writes are also done through an indirection, writing either to the
correct destination [if the condition is true] or to the
scratchpad [if the condition is false].

Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the
old if-conversion resulted in unsafe code that could fail under
multithreading even though the as-written code _was_ thread-safe.

Passed regression testing and bootstrap on amd64-linux.
Is this OK to commit to trunk?

Regards,

Abe




2015-06-12  Sebastian Pop  
        Abe Skolnik  

PR tree-optimization/46029
* tree-data-ref.c (struct data_ref_loc_d): Moved...
(get_references_in_stmt): Exported.
* tree-data-ref.h (struct data_ref_loc_d): ... here.
(get_references_in_stmt): Declared.

* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
* tree-if-conv.c (struct ifc_dr): Removed.
(IFC_DR): Removed.
(DR_WRITTEN_AT_LEAST_ONCE): Removed.
(DR_RW_UNCONDITIONALLY): Removed.
(memrefs_read_or_written_unconditionally): Removed.
(write_memrefs_written_at_least_once): Removed.
(ifcvt_could_trap_p): Does not take refs parameter anymore.
(ifcvt_memrefs_wont_trap): Removed.
(has_non_addressable_refs): New.
(if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
Removed use of refs.
(if_convertible_stmt_p): Removed use of refs.
(if_convertible_gimple_assign_stmt_p): Same.
(if_convertible_loop_p_1): Removed use of refs.  Remove initialization
of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
(insert_address_of): New.
(create_scratchpad): New.
(create_indirect_cond_expr): New.
(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
parameter for scratch_pad.
(combine_blocks): Same.
(tree_if_conversion): Same.

testsuite/
* g++.dg/tree-ssa/ifc-pr46029.C: New.
* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
* gcc.dg/tree-ssa/ifc-8.c: New.
* gcc.dg/tree-ssa/ifc-9.c: New.
* gcc.dg/tree-ssa/ifc-10.c: New.
* gcc.dg/tree-ssa/ifc-11.c: New.
* gcc.dg/tree-ssa/ifc-12.c: New.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New.
---
 gcc/ChangeLog  |  28 ++
 gcc/doc/invoke.texi|  18 +-
 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C|  76 
 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c |  17 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c |  16 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c |  13 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c  |  19 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c  |  29 ++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c  |  17 +
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c  |  10 +-
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c  |  46 +++
 gcc/tree-data-ref.c|  13 +-
 gcc/tree-data-ref.h|  14 +
 gcc/tree-if-conv.c | 392 +
 14 files changed, 460 insertions(+), 248 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3dec6b1..70af07c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2015-05-18  Sebastian Pop  
+
+   PR tree-optimization/46029
+   * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
+   * tree-if-conv.c (has_unaligned_memory_refs): New.
+   (if_convertible_gimple_assign_