[PATCH] fix PR46029: reimplement if conversion of loads and stores [3nd submitted version of patch]
Below, please find the 3nd submitted version of this patch, now with some more issues resolved. Regards, Abe From 87af575347e216672e322bbc1b4ae0a5ab93507f Mon Sep 17 00:00:00 2001 From: Abe Date: Mon, 18 May 2015 14:26:29 -0500 Subject: [PATCH] fix PR46029: reimplement if conversion of loads and stores New relative to previous version of same patch: * addressed comments/suggestions from Alan and Richard. * disabled the test cases that are currently not if-converted correctly, pending improvements to the new if-converter Passed regression testing and bootstrap on AMD64 GNU/Linux [AKA "x86_64-unknown-linux-gnu"]. 2015-06-12 Sebastian Pop Abe Skolnik PR tree-optimization/46029 * tree-data-ref.c (struct data_ref_loc_d): Moved... (get_references_in_stmt): Exported. * tree-data-ref.h (struct data_ref_loc_d): ... here. (get_references_in_stmt): Declared. * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description. * tree-if-conv.c (struct ifc_dr): Removed. (IFC_DR): Removed. (DR_WRITTEN_AT_LEAST_ONCE): Removed. (DR_RW_UNCONDITIONALLY): Removed. (memrefs_read_or_written_unconditionally): Removed. (write_memrefs_written_at_least_once): Removed. (ifcvt_could_trap_p): Does not take refs parameter anymore. (ifcvt_memrefs_wont_trap): Removed. (has_non_addressable_refs): New. (if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs. Removed use of refs. (if_convertible_stmt_p): Removed use of refs. (if_convertible_gimple_assign_stmt_p): Same. (if_convertible_loop_p_1): Removed use of refs. Remove initialization of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY. (insert_address_of): New. (create_scratchpad): New. (create_indirect_cond_expr): New. (predicate_mem_writes): Call create_indirect_cond_expr. Take an extra parameter for scratch_pad. (combine_blocks): Same. (tree_if_conversion): Same. testsuite/ * g++.dg/tree-ssa/ifc-pr46029.C: New. * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel. * gcc.dg/tree-ssa/ifc-8.c: New. * gcc.dg/tree-ssa/ifc-9.c: New. * gcc.dg/tree-ssa/ifc-10.c: New. * gcc.dg/tree-ssa/ifc-11.c: New. * gcc.dg/tree-ssa/ifc-12.c: New. * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled. * gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New. * gcc.dg/vect/pr61194.c: Disabled. * gcc.dg/vect/vect-mask-load-1.c: Disabled. * gcc.dg/vect/vect-mask-loadstore-1.c: Disabled. * gcc.target/i386/avx2-gather-6.c: Disabled. * gcc.target/i386/avx2-vect-aggressive-1.c: Disabled. * gcc.target/i386/avx2-vect-aggressive.c: Disabled. --- gcc/ChangeLog | 28 ++ gcc/common.opt | 2 +- gcc/doc/invoke.texi| 29 +- gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C| 76 gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c | 17 + gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c | 16 + gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c | 13 + gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c | 19 +- gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c | 29 ++ gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c | 17 + .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c | 10 +- .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c | 46 +++ gcc/testsuite/gcc.dg/vect/pr61194.c| 4 +- gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c | 4 +- gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c | 4 +- gcc/testsuite/gcc.target/i386/avx2-gather-6.c | 4 +- .../gcc.target/i386/avx2-vect-aggressive-1.c | 4 +- .../gcc.target/i386/avx2-vect-aggressive.c | 4 +- gcc/tree-data-ref.c| 13 +- gcc/tree-data-ref.h| 14 + gcc/tree-if-conv.c | 417 + gcc/tree-vect-stmts.c | 2 +- 23 files changed, 501 insertions(+), 273 deletions(-) create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c create mode 100644 gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3dec6b1..70af07c 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,31 @@ +2015-05-18 Sebastian Pop + +
revised and updated new-if-converter patch… [PATCH] fix PR46029: reimplement if conversion of loads and stores
Dear all, Relative to the previous submission of this same patch, the below corrects some minor spacing and/or indentation issues, misc. other formatting fixes, and makes the disabled vectorization tests be disabled via "xfail" rather than by adding spaces to deliberately cause the relevant scanned-for text to not be found by DejaGNU so as to prevent the DejaGNU line from being interpreted. The below is also based on a Git checkout that was rebased to the latest upstream check-in from today, so it should merge cleanly with trunk as of today. Regards, Abe From 89e115118dcc49d6839db2e9d7ae6c330789c235 Mon Sep 17 00:00:00 2001 Subject: [PATCH] fix PR46029: reimplement if conversion of loads and stores In the current implementation of if conversion, loads and stores are if-converted in a thread-unsafe way: * loads were always executed, even when they should have not been. Some source code could be rendered invalid due to null pointers that were OK in the original program because they were never dereferenced. * writes were if-converted via load/maybe-modify/store, which renders some code multithreading-unsafe. This patch reimplements if-conversion of loads and stores in a safe way using a scratchpad allocated by the compiler on the stack: * loads are done through an indirection, reading either the correct data from the correct source [if the condition is true] or reading from the scratchpad and later ignoring this read result [if the condition is false]. * writes are also done through an indirection, writing either to the correct destination [if the condition is true] or to the scratchpad [if the condition is false]. Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the old if-conversion resulted in unsafe code that could fail under multithreading even though the as-written code _was_ thread-safe. Passed regression testing and bootstrap on amd64-linux. 2015-06-12 Sebastian Pop Abe Skolnik PR tree-optimization/46029 * tree-data-ref.c (struct data_ref_loc_d): Moved... (get_references_in_stmt): Exported. * tree-data-ref.h (struct data_ref_loc_d): ... here. (get_references_in_stmt): Declared. * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description. * tree-if-conv.c (struct ifc_dr): Removed. (IFC_DR): Removed. (DR_WRITTEN_AT_LEAST_ONCE): Removed. (DR_RW_UNCONDITIONALLY): Removed. (memrefs_read_or_written_unconditionally): Removed. (write_memrefs_written_at_least_once): Removed. (ifcvt_could_trap_p): Does not take refs parameter anymore. (ifcvt_memrefs_wont_trap): Removed. (has_non_addressable_refs): New. (if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs. Removed use of refs. (if_convertible_stmt_p): Removed use of refs. (if_convertible_gimple_assign_stmt_p): Same. (if_convertible_loop_p_1): Removed use of refs. Remove initialization of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY. (insert_address_of): New. (create_scratchpad): New. (create_indirect_cond_expr): New. (predicate_mem_writes): Call create_indirect_cond_expr. Take an extra parameter for scratch_pad. (combine_blocks): Same. (tree_if_conversion): Same. testsuite/ * g++.dg/tree-ssa/ifc-pr46029.C: New. * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel. * gcc.dg/tree-ssa/ifc-8.c: New. * gcc.dg/tree-ssa/ifc-9.c: New. * gcc.dg/tree-ssa/ifc-10.c: New. * gcc.dg/tree-ssa/ifc-11.c: New. * gcc.dg/tree-ssa/ifc-12.c: New. * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled. * gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New. --- gcc/common.opt | 2 +- gcc/doc/invoke.texi| 29 +- gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C| 76 gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c | 17 + gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c | 16 + gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c | 13 + gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c | 19 +- gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c | 29 ++ gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c | 17 + .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c | 10 +- .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c | 46 +++ gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c | 1 + gcc/testsuite/gcc.target/i386/avx2-gather-6.c | 2 +- .../gcc.target/i386/avx2-vect-aggressive-1.c | 2 +- .../gcc.target/i386/avx2-vect-aggressive.c | 3 +- gcc/tree-data-ref.c| 13 +- gcc/tree-data-ref.h
Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by the o
nverts something like (same example as one of the above): if (C[index]) A[index] = foo(bar); // important: no else here ... into something like: /* the type of the scalar goes here */ * pointer = C[index] ? &A[index] : &scratchpad; __compiler_temp = foo(bar); *pointer = C[index] ? __compiler_temp : scratchpad; ... so that in the event of C[] being mostly false, most of the reads are _from_ the scratchpad and most of the writes are _to_ the scratchpad. This not only probably* saves a lot of the potentially-massive number of wasted writes due to the old conversion rewriting all of A[] no matter what the values of C[]`s elements are, it also saves a lot of reads when there is enough clustering of false values in C[] that entire cache lines worth of A[] can be not read from main RAM. Caching and the "chunking effect" of cache lines, besides qualifying how much reading is saved by the new conversion relative to the old one, also is the reason I needed to qualify "probably* saves most/all the [...] wasted writes": it depends on how well or how poorly the values in C[] are clustered. In the case that is best for the new converter, the new converter saves at least almost all the waste of the old converter`s code on both the read path _and_ the write path [modulo at most one cache line of both read and write]. I hope the above helps to explain why the new converter is worth merging to trunk, even though for now it still has performance regressions which we expect to fix before the first GCC 6.x is tagged for release. Regards, Abe
Re: revised and updated new-if-converter patch… [PATCH] fix PR46029: reimplement if conversion of loads and stores
[Alan wrote:] > Would having a testsuite predicate for the target supporting gathered loads > let you run this test on those architectures? I'd expect one to be useful > in a few other places too, in time if it doesn't exist already... Thanks, but I don`t think that would fully fix all the regressions about which I am currently concerned. I`m sure it could be useful to solve other problems, but as for the problems I am currently working on, even though the above could easily provide a "way out", this is not the way in which I want to fix the problem. I`d much rather fix/improve the new if converter so that it doesn`t require the target architecture to be able to gather/scatter when there is no fundamental need to do so, e.g. when the vectorizer is complaining with messages that include the word "gather" only because the new if converter is being "wasteful" with indirections that are not truly necessary and which the vectorizer cannot tolerate [and for which there is no intermediate pass that eliminates them]. In other words, the new if converter IMO needs to be more careful, more analytical, and more parsimonious with its use of indirection than it is right now. It`s already a lot better than the old converter, but it`s not perfect yet. IMO it`s good enough to push to trunk already largely b/c we have a lot of time IMO-and-AFAIK before 6.x is tagged for release. The sooner this gets pushed to trunk, the sooner we`ll have more "eyeballs" on this code and more people might have good ideas for how to fix the regressions and otherwise improve the new code. One of the best/most-important things I learned during my undergraduate Computer Science education is "all problems in CS can be solved by adding another layer/level of indirection, _except_ for the problem “I/we/the-program have/has too many layers/levels of indirection”". ;-) Regards, Abe
Re: Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by t
C[index] being convertible to true is very low, _and_ the statically-estimated evaluation cost of "exp(x)" is both under the maximum and too close to that maximum}. Arguably, barring bugs in the cost estimation, if this happens too often in real-world code, then we have set the maximum too high and should adjust it. Dependent on whether the scratchpad is shared for different sized accesses or not the scratchpad may also introduce store hazards in the CPU because if scratchpad-as-int = ...; follows scratchpad-as-short = ...; the latter only partially kills the preceeding store to the scratchpad (usually CPUs can't merge the two requests in the store buffer). It seems to me that the worst that can happen as a result of the preceding is that non-{in-order} CPUs will be somewhat restricted in how much they can re-order some of the internal operations that result from the decoding of the machine-code instructions. This seems to me to not be a terribly big problem since the code in question should be in a vectorized loop anyway. If the preceding causes a long stall on a large # of CPUs, then we should do something about it, of course. Otherwise, I propose that the preceding may be a largely-theoretical performance problem the solving of which doesn`t need to be a high priority until/unless it is proven to be a very real-world problem. Not sure which approach to allocating the scratchpad you are using currently. We are currently using something at least "morally equivalent" to a single: char scratchpad[64]; ... for each routine with a conversion that triggers scratchpad generation. I still have to review the patch itself ... (and so many others) The work of a mother or a GCC maintainer is never done. ;-) How do you expect to fix the performance regressions? Mostly by the fruit of my labor with help from Sebastian. I`d _really_ like to get the new converter into trunk ASAP so that there will be many more "eyes" on the code. Also, it is important IMO that this code should not languish for months/years [again], since that is what happened approx. 5 years ago when Sebastian wrote it in the first place. I don`t think a branch is the right place for this code, since the branch would allow the code to die. Regards, Abe
Re: Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by t
Well. We don't generally introduce regressions with changes. Understood. Regressions are bad, of course. TTBOMK the regressions in question are temporary. Once they are gone, I think we can then look at whether or not we still need to keep the old if converter in trunk. Ideally, it eventually becomes redundant and unneeded. (well, the patch still needs review - > I hope to get to that this week). After I`ve done the SPEC-based analysis, my next planned steps on this work are to disable the code that [in my WIP] currently causes conversion to be enabled by default when autovectorization is enabled, then to re-integrate the old converter and implement the switches that will give GCC users access to the modes I described in a recent email from me. You might prefer to delay your code review until I have that all done and a new version of the patch submitted. Regards, Abe
Re: Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by t
[Abe wrote:] After I`ve done the SPEC-based analysis, my next planned steps on this work are to disable the code that [in my WIP] currently causes conversion to be enabled by default when autovectorization is enabled, then to re-integrate the old converter and implement the switches that will give GCC users access to the modes I described in a recent email from me. You might prefer to delay your code review until I have that all done and a new version of the patch submitted. [Richard wrote:] I'm not sure we want two if-converters. Not _permanently_, no. However, having the old one available and be the one that is activated -- [1] by default under conditions in which current GCC trunk activates if conversion by default, and [2] when the existing/old flags call for it -- allows us to have the old converter still apply in all cases in which it currently applies, to have the new converter in trunk so it has eyes on it and Sebastian and myself won`t be the only people working on it anymore [I hope ;-)]. The preceding will also make comparison of the two converters easier [use the same compiler build, just vary your flags] and more reliable [all other compiler components will be identical, so we have a stronger guarantee of apples-to-apples comparison]. All of that will help us make the new converter better and better until the old one becomes completely unneeded, at which point we can remove it and "massage" the user-visible flags and the semantics thereof -- ideally only doing this in a release just before the release of a new GCC major version number so we don`t introduce breaking changes in the middle of a major version number, of course. [not pretending to "teach" anybody anything here -- just showing that I, too, understand the basics of releasing decent software -- or at least _some_ of those basics ;-)] What we do want is avoid using a scratch-pad > if it is safe to do (for loads and stores) I strongly agree. In fact, TTBOMK that is a big part of improving the new converter so it no longer has performance regressions. and if the user tells us he is fine with store data races (for stores). Wouldn`t it be nice -- if we can do so without killing ourselves over it -- to only take the flag in question as _permission_ to generate a data race? IOW, it would be nice if a cost-model-based analysis would be able to tell us e.g. "for this loop, it`s worth it due to the many saved machine cycles" versus "for this loop, it`s not worth it: you only stand to save a cycle or two, and you potentially sacrifice some correctness in the process". Does the "new" if-converter get rid of the analysis code that determined "safe"? If so you should re-instantiate that. I don`t have a good answer for that right now and don`t anticipate having the correct answer soon enough that I should delay this reply IMO. I`ll get back to you on this question. Regards, Abe
Re: Minor typo fixes
Thank you, sir. :-) Regards, Abe On 8/3/15 10:53 PM, Jeff Law wrote: I was starting to look at Abe's changes to the gimple if-converter and realized a handful of the changes were just fixing comments. No reason those shouldn't go in immediately. So I pulled them out and applied those changes to the trunk. Abe -- if you find more of those kind of changes, don't hesitate to break them out into their own patch and they can go forward very quickly. Attached are the fixes that were actually applied. They were bootstrapped for completeness.
[PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]
New relative to previous version of same patch: * addressed comments/suggestions from Alan and Richard. * disabled the test cases that are currently not if-converted correctly, pending improvements to the new if-converter Passed regression testing and bootstrap on amd64-linux. 2015-06-12 Sebastian Pop Abe Skolnik PR tree-optimization/46029 * tree-data-ref.c (struct data_ref_loc_d): Moved... (get_references_in_stmt): Exported. * tree-data-ref.h (struct data_ref_loc_d): ... here. (get_references_in_stmt): Declared. * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description. * tree-if-conv.c (struct ifc_dr): Removed. (IFC_DR): Removed. (DR_WRITTEN_AT_LEAST_ONCE): Removed. (DR_RW_UNCONDITIONALLY): Removed. (memrefs_read_or_written_unconditionally): Removed. (write_memrefs_written_at_least_once): Removed. (ifcvt_could_trap_p): Does not take refs parameter anymore. (ifcvt_memrefs_wont_trap): Removed. (has_non_addressable_refs): New. (if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs. Removed use of refs. (if_convertible_stmt_p): Removed use of refs. (if_convertible_gimple_assign_stmt_p): Same. (if_convertible_loop_p_1): Removed use of refs. Remove initialization of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY. (insert_address_of): New. (create_scratchpad): New. (create_indirect_cond_expr): New. (predicate_mem_writes): Call create_indirect_cond_expr. Take an extra parameter for scratch_pad. (combine_blocks): Same. (tree_if_conversion): Same. testsuite/ * g++.dg/tree-ssa/ifc-pr46029.C: New. * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel. * gcc.dg/tree-ssa/ifc-8.c: New. * gcc.dg/tree-ssa/ifc-9.c: New. * gcc.dg/tree-ssa/ifc-10.c: New. * gcc.dg/tree-ssa/ifc-11.c: New. * gcc.dg/tree-ssa/ifc-12.c: New. * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled. * gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New. * gcc.dg/vect/pr61194.c: Disabled. * gcc.dg/vect/vect-mask-load-1.c: Disabled. * gcc.dg/vect/vect-mask-loadstore-1.c: Disabled. * gcc.target/i386/avx2-gather-6.c: Disabled. * gcc.target/i386/avx2-vect-aggressive-1.c: Disabled. * gcc.target/i386/avx2-vect-aggressive.c: Disabled. --- gcc/ChangeLog | 28 ++ gcc/common.opt | 2 +- gcc/doc/invoke.texi| 18 +- gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C| 76 gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c | 17 + gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c | 16 + gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c | 13 + gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c | 19 +- gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c | 29 ++ gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c | 17 + .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c | 10 +- .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c | 46 +++ gcc/testsuite/gcc.dg/vect/pr61194.c| 4 +- gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c | 4 +- gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c | 4 +- gcc/testsuite/gcc.target/i386/avx2-gather-6.c | 4 +- .../gcc.target/i386/avx2-vect-aggressive-1.c | 4 +- .../gcc.target/i386/avx2-vect-aggressive.c | 4 +- gcc/tree-data-ref.c| 13 +- gcc/tree-data-ref.h| 14 + gcc/tree-if-conv.c | 416 ++--- 22 files changed, 500 insertions(+), 260 deletions(-) create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c create mode 100644 gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3dec6b1..70af07c 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,31 @@ +2015-05-18 Sebastian Pop + + PR tree-optimization/46029 + * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description. + * tree-if-conv.c (has_unaligned_memory_refs): New. + (if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs. + (create_scratchpad): New. + (create_indirect_cond_expr): New. + (predicate_mem_writes): Call create_indirect_cond_expr. Take an extra + parameter
Re: [PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]
On 7/2/15 4:49 AM, Alan Lawrence wrote: Thanks, Abe. You are welcome, sir! :-) As before, I'm still confused here. This still returns false, i.e. bails out of if-conversion, if the statement could trap. Doesn't the scratchpad let us handle that? Or do we just not care because it won't be vectorizable anyway??? This seems like an opportunity for more optimization in the future. However, at this time we do not see what kind of code would benefit from this optimization. If you have some time to spare and wish to spend some of it on this issue, then please find/write a test case that would exercise this path, i.e. a snippet of code that is not optimized due to the above concern, even though it _could_ be if-converted -- and, preferably, the resulting conversion _is_ vectorizer-friendly. Of course, even if a test case can be written to trigger this missed opportunity, that in and of itself does not yet tell us how much opportunity we are missing in _real-world_ code. Nit: as before [...] Thanks for the reminder[s]. Nit: as before - thanks for fixing the example here You are welcome. Where can I find info on what the different flag values mean? > (I had thought they were booleans [...] Sorry; I don`t know if that is documented anywhere yet. In this case, (-1) simply means "defaulted": on if the vectorizer is on, and off if it is off. (0) means "user specified no if conversion" and (1) means "user specified [yes] if conversion". Regards, Abe
Re: [PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]
[Abe wrote:] This seems like an opportunity for more optimization in the future [On 7/6/15 10:09 AM, Alan Lawrence wrote:] we get enough benefit from the patch, even without my suggested extra change. Ok, fair enough! Thanks for the clarification. You are welcome, sir. [Alan wrote:] Where can I find info on what the different flag values mean? (I had thought they were booleans [...] [Abe wrote:] Sorry; I don`t know if that is documented anywhere yet. In this case, (-1) simply means "defaulted": on if the vectorizer is on, and off if it is off. (0) means "user specified no if conversion" and (1) means "user specified [yes] if conversion". [Alan wrote:] Ah, right, that makes sense now. Obviously I would like to see this written in doc/ . Please consider it added to the work queue. ;-) Regards, Abe
RE: fix PR46029: reimplement if conversion of loads and stores
(if-conversion could directly generate masked load/stores of course and not use a scratch-pad at all in that case). IMO that`s a great idea, but I don`t know how to do it. Hints would be welcome. In particular, how does one "generate masked load/stores" at the GIMPLE level? But are we correctly handling these cases in the current if conversion code? I`m uncertain to what that is intended to refer, but I believe Sebastian would agree that the new if converter is safer than the old one in terms of correctness at the time of running the code being compiled. Abe's changes would seem like a step forward from a correctness standpoint Not to argue, but as a point of humility: Sebastian did by far the most work on this patch. I just modernized it and helped move it along. Even _that_ was done with Sebastian`s help. even if they take us a step backwards from a performance standpoint. For now, we have a few performance regressions, and so far we have found that it`s non-trivial to remove all of those regressions. We may be better off pushing the current patch to trunk and having the performance regressions currently introduced by the new if converter be fixed by later patches. Pushing to trunk gives us excellent "visibility" amongst GCC hackers, so the code will get more "eyeballs" than if it lingers in an uncommitted patch or in a branch. I, for one, would love some help in fixing these performance regressions. ;-) If fixing the performance regressions winds up taking too long, perhaps the current imperfect patch could be undone on trunk just before a release is tagged, and then we`ll push it in again when trunk goes back to being allowed to be unstable? According to my analysis of the data near the end of the page at <https://gcc.gnu.org/develop.html>, we have until roughly April of 2016 to work on not-yet-perfect patches in trunk. So the question is whether we get more non-vectorized if-converted code out of this (and thus whether we want to use --param allow-store-data-races to get the old code back which is nicer to less capable CPUs and probably faster than using scatter/gather or masked loads/stores). I do think conditionalizing some of this on the allow-store-data-races makes sense. I think having both the old if-converter and the new one "live on" in GCC is nontrivial, but not impossible. I also don`t think it`s the best long-term goal, but only a short-term workaround. In the long run, IMO there should be only one if converter, albeit perhaps with tweaking flags [e.g. "-fallow-unsafe-if-conversion"]. I also wonder if we should really care about load data races (not sure your patch does). According to a recent long discussion I had with Sebastian, our current patch does not have the flaw I was concerned it might have in terms of loads because: [1] the scratchpad is only being used to if-convert assignments to thread-local scalars, never to globals/statics, and because... [2] the gimplifier is supposed to detect "the address of this scalar has been taken" and when such is detected in the code being compiled, it causes the scalar to no longer look like a scalar in GIMPLE so that we are also safe from stale-data problems that could come from corner-case code that takes the address of a thread-local variable and gives that address to another thread [which then proceeds to overwrite the value in the supposedly-thread-local scalar that belongs to a different thread from the one doing the writing] Regards, Abe
Re: [PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]
[Alan wrote:] Where can I find info on what the different flag values mean? (I had thought they were booleans [...] [Abe wrote:] Sorry; I don`t know if that is documented anywhere yet. In this case, (-1) simply means "defaulted": on if the vectorizer is on, and off if it is off. (0) means "user specified no if conversion" and (1) means "user specified [yes] if conversion". [Alan wrote:] Ah, right, that makes sense now. Obviously I would like to see this written in doc/ . Is the following sufficient? From "doc/invoke.texi", I propose to replace: This is enabled by default if vectorization is enabled. ... with: This is enabled by default when vectorization is enabled anddisabled by default when vectorization is disabled. Regards, Abe
Re: fix PR46029: reimplement if conversion of loads and stores
[Abe wrote:] I believe Sebastian would agree that the new if converter is safer than the old one >> in terms of correctness at the time of running the code being compiled. [...] For now, we have a few performance regressions [snip] [Alan wrote:] TBH my two cents would be that a performance-regressed, but correct, compiler, is far better to release, than a > performance-"improved" one that generates unsafe code (e.g. with extra faults in the straightforward single-threaded case!)... I strongly agree that -- by default -- correctness trumps performance. The only times it is allowable to reverse that relationship, IMO, is when the user* of the compiler has explicitly specified flags [e.g. "-ffast-math", "-Ofast"] that tell the compiler that the user* currently cares more about performance than about {correctness-according-to-spec and/or safety in all conditions including null pointers}. ['*': or Makefiles, build scripts, etc.] FYI: TTBOMK, the old if converter was not unsafe with default flags or with only "big knobs" like "-O3"; I`m unsure what it did under "-ffast-math" and "-Ofast", if anything of interest. The main advantage of the new if converter over the old one is that the new one is safe in certain situations wherein the old one is unsafe, e.g. the old one may cause the vectorized code to segfault where the non-if-converted code would have run just fine all the way to program completion with the same inputs. This additional safety allows the new converter to be used under more conditions, which in turn allows it to be enabled by default. We intend for all the safe if-conversions to be done by default whenever the vectorizer is on. If there are any unsafe conversions left, which I`m not sure there are, then we will enable them only when the user* specifies something like "-fif-conversion=allow-unsafe". The "allows it to be enabled by default" property should help the code that GCC generates under "-O3" w/o any additional flags to be faster than it currently is, for the relevant targets, *without sacrificing even _one_ _bit_ of correctness*. Regards, Abe
Re: fix PR46029: reimplement if conversion of loads and stores
(if-conversion could directly generate masked load/stores of course and not use a scratch-pad at all in that case). [Abe wrote:] IMO that`s a great idea, but I don`t know how to do it. Hints would be welcome. In particular, how does one >> "generate masked load/stores" at the GIMPLE level? [Richard Biener wrote:] It already does that, see predicate_mem_writes. > You should definitely preserve that path Thanks. Yes, we have not intentionally disabled that. On what hardware did you test? AMD64 arch., Intel implementation. Nothing fancy AFAIK in the flags to make it super-specific, e.g. "-march=nocona" or "-march=native". Occasionally using AVX-2 flags as specified by some test cases. Regards, Abe
Re: [PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]
[Abe wrote:] Is the following sufficient? From "doc/invoke.texi", I propose to replace: This is enabled by default if vectorization is enabled. ... with: This is enabled by default when vectorization is enabled anddisabled by default when vectorization is disabled. [Alan wrote:] That would be helpful - but has the syntax not changed? AFAIK & IIRC, not yet. We plan to change the flags later. [Alan wrote:] I was thinking it'd be useful to document somewher the meanings of 1/0/-1, as you explained them to me: [snip] Perhaps in the gate() function if not in the doc. OK. It seemed like the doc files was not the right place to document internal values that flags influence. Regards, Abe
[PATCH] vax: resolve long-standing documentation bugs re floating-point codegen [PR79646]
Howdy, y`all. After many years away from contributing to GCC, I am rejoining as a "gardener": my intent/plan is to clean up the bug backlog, weeding out old bugs that are relatively-easy to fix yet have languished for a long time. First for today`s gardening: a documentation-only bug or two related to how floating-point code is generated for VAX-compatible CPUs. Yes, _VAX_. I made sure to update the translation files, using an on-line translator for languages which I can`t really read/write on my own. In at least one case where the translation was empty before, I left it empty. In at least one case where the translation was empty before, I filled it in. Copyright assignment to the FSF, yadda yadda yadda... all the usual. Sincerely, Abe From ec5f259d0e7dd7dcd1194f775bf00d3decb786f3 Mon Sep 17 00:00:00 2001 From: Abe Date: Wed, 24 Apr 2024 21:06:50 -0400 Subject: [PATCH] PR79646: corrected VAX-specific message strings related to double-precision codegen, updated+improved translations of those messages. PR target/79646 - Typos in vax.opt PR target/79646 gcc/ChangeLog: * config/vax/vax.opt: gcc/po/ChangeLog: * be.po: * da.po: * de.po: * el.po: * es.po: * fi.po: * fr.po: * hr.po: * id.po: * ja.po: * nl.po: * ru.po: * sr.po: * sv.po: * tr.po: * uk.po: * vi.po: * zh_CN.po: * zh_TW.po: --- gcc/config/vax/vax.opt | 8 gcc/po/be.po | 8 gcc/po/da.po | 10 -- gcc/po/de.po | 8 gcc/po/el.po | 8 gcc/po/es.po | 8 gcc/po/fi.po | 8 gcc/po/fr.po | 8 gcc/po/hr.po | 4 ++-- gcc/po/id.po | 8 gcc/po/ja.po | 8 gcc/po/nl.po | 4 ++-- gcc/po/ru.po | 8 gcc/po/sr.po | 10 -- gcc/po/sv.po | 6 +++--- gcc/po/tr.po | 10 -- gcc/po/uk.po | 4 ++-- gcc/po/vi.po | 8 gcc/po/zh_CN.po| 10 -- gcc/po/zh_TW.po| 10 -- 20 files changed, 73 insertions(+), 83 deletions(-) diff --git a/gcc/config/vax/vax.opt b/gcc/config/vax/vax.opt index 2cc66e543fe..fa2be78e9fa 100644 --- a/gcc/config/vax/vax.opt +++ b/gcc/config/vax/vax.opt @@ -20,19 +20,19 @@ md Target RejectNegative InverseMask(G_FLOAT) -Target DFLOAT double precision code. +Generate DFLOAT double-precision code. md-float Target RejectNegative InverseMask(G_FLOAT) -Target DFLOAT double precision code. +Generate DFLOAT double-precision code. mg Target RejectNegative Mask(G_FLOAT) -Generate GFLOAT double precision code. +Generate GFLOAT double-precision code. mg-float Target RejectNegative Mask(G_FLOAT) -Generate GFLOAT double precision code. +Generate GFLOAT double-precision code. mgnu Target RejectNegative InverseMask(UNIX_ASM) diff --git a/gcc/po/be.po b/gcc/po/be.po index ae92f9b7b96..db102673665 100644 --- a/gcc/po/be.po +++ b/gcc/po/be.po @@ -11502,13 +11502,13 @@ msgstr "" #: config/vax/vax.opt:23 config/vax/vax.opt:27 #, no-c-format -msgid "Target DFLOAT double precision code." -msgstr "" +msgid "Generate DFLOAT double-precision code." +msgstr "ÐенеÑÑÑаваÑÑ ÐºÐ¾Ð´ double-precision Ð´Ð»Ñ DFLOAT." #: config/vax/vax.opt:31 config/vax/vax.opt:35 #, no-c-format -msgid "Generate GFLOAT double precision code." -msgstr "" +msgid "Generate GFLOAT double-precision code." +msgstr "ÐенеÑÑÑаваÑÑ ÐºÐ¾Ð´ double-precision Ð´Ð»Ñ GFLOAT." #: config/vax/vax.opt:39 #, fuzzy, no-c-format diff --git a/gcc/po/da.po b/gcc/po/da.po index 9871ec5b82b..7d82a108897 100644 --- a/gcc/po/da.po +++ b/gcc/po/da.po @@ -12372,15 +12372,13 @@ msgstr "" #: config/vax/vax.opt:23 config/vax/vax.opt:27 #, fuzzy, no-c-format -#| msgid "Generate GFLOAT double precision code" -msgid "Target DFLOAT double precision code." -msgstr "Opret GFLOAT dobbeltpræcision kode" +msgid "Generate DFLOAT double-precision code." +msgstr "Opret GFLOAT dobbeltpræcision kode." #: config/vax/vax.opt:31 config/vax/vax.opt:35 #, fuzzy, no-c-format -#| msgid "Generate GFLOAT double precision code" -msgid "Generate GFLOAT double precision code." -msgstr "Opret GFLOAT dobbeltpræcision kode" +msgid "Generate GFLOAT double-precision code." +msgstr "Opret GFLOAT dobbeltpræcision kode." #: config/vax/vax.opt:39 #, fuzzy, no-c-format diff --git a/gcc/po/de.po b/gcc/po/de.po index db42d7191ff..1303b189a4b 100644 --- a/gcc/po/de.po +++ b/gcc/po/de.po @@ -11333,13 +11333,13 @@ msgstr "Daten werden ausgehend vom Start des Programmcodes statt der GOT adressi #: config/vax/vax.opt:23 config/vax/vax.opt:27 #, no-c-format -msgid "Target DFLOAT double precision code." -msgstr &q
fix PR46029: reimplement if conversion of loads and stores
Hi everybody! In the current implementation of if conversion, loads and stores are if-converted in a thread-unsafe way: * loads were always executed, even when they should have not been. Some source code could be rendered invalid due to null pointers that were OK in the original program because they were never dereferenced. * writes were if-converted via load/maybe-modify/store, which renders some code multithreading-unsafe. This patch reimplements if-conversion of loads and stores in a safe way using a scratchpad allocated by the compiler on the stack: * loads are done through an indirection, reading either the correct data from the correct source [if the condition is true] or reading from the scratchpad and later ignoring this read result [if the condition is false]. * writes are also done through an indirection, writing either to the correct destination [if the condition is true] or to the scratchpad [if the condition is false]. Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the old if-conversion resulted in unsafe code that could fail under multithreading even though the as-written code _was_ thread-safe. Passed regression testing and bootstrap on amd64-linux. Is this OK to commit to trunk? Regards, Abe 2015-06-12 Sebastian Pop Abe Skolnik PR tree-optimization/46029 * tree-data-ref.c (struct data_ref_loc_d): Moved... (get_references_in_stmt): Exported. * tree-data-ref.h (struct data_ref_loc_d): ... here. (get_references_in_stmt): Declared. * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description. * tree-if-conv.c (struct ifc_dr): Removed. (IFC_DR): Removed. (DR_WRITTEN_AT_LEAST_ONCE): Removed. (DR_RW_UNCONDITIONALLY): Removed. (memrefs_read_or_written_unconditionally): Removed. (write_memrefs_written_at_least_once): Removed. (ifcvt_could_trap_p): Does not take refs parameter anymore. (ifcvt_memrefs_wont_trap): Removed. (has_non_addressable_refs): New. (if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs. Removed use of refs. (if_convertible_stmt_p): Removed use of refs. (if_convertible_gimple_assign_stmt_p): Same. (if_convertible_loop_p_1): Removed use of refs. Remove initialization of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY. (insert_address_of): New. (create_scratchpad): New. (create_indirect_cond_expr): New. (predicate_mem_writes): Call create_indirect_cond_expr. Take an extra parameter for scratch_pad. (combine_blocks): Same. (tree_if_conversion): Same. testsuite/ * g++.dg/tree-ssa/ifc-pr46029.C: New. * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel. * gcc.dg/tree-ssa/ifc-8.c: New. * gcc.dg/tree-ssa/ifc-9.c: New. * gcc.dg/tree-ssa/ifc-10.c: New. * gcc.dg/tree-ssa/ifc-11.c: New. * gcc.dg/tree-ssa/ifc-12.c: New. * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled. * gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New. --- gcc/ChangeLog | 28 ++ gcc/doc/invoke.texi| 18 +- gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C| 76 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c | 17 + gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c | 16 + gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c | 13 + gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c | 19 +- gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c | 29 ++ gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c | 17 + .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c | 10 +- .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c | 46 +++ gcc/tree-data-ref.c| 13 +- gcc/tree-data-ref.h| 14 + gcc/tree-if-conv.c | 392 + 14 files changed, 460 insertions(+), 248 deletions(-) create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c create mode 100644 gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3dec6b1..70af07c 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,31 @@ +2015-05-18 Sebastian Pop + + PR tree-optimization/46029 + * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description. + * tree-if-conv.c (has_unaligned_memory_refs): New. + (if_convertible_gimple_assign_