Re: making the new if-converter not mangle IR that is already vectorizer-friendly
On Fri, Jul 31, 2015 at 7:43 PM, Abe wrote: > [Abe wrote:] >>> >>> TTBOMK, that "expense" is temporary, pending fixes/improvement >>> to the new converter. IOW, I don`t think any of the relevant >>> regressions are as a result of "essential complexity". > > > [Richard wrote:] >> >> I don't see how you can fix this without scatter support in the CPU. > > > Well, for some test cases, I can see in the GIMPLE dumps that > a/the scratchpad wasn`t needed here and wasn`t needed there. > IOW, the converter that "knows how" to indirect via scratchpads > is doing so _always_ rather than only when it`s truly needed. > This is causing additional indirection which the vectorizer > rejects. Making the new converter "smarter" should fix this. > > Overall, however, I see your point and agree with it. Gather support > is needed for the loads and scatter for the stores, in the general case. You skipped my analysis that even with scatter/gather support as implemented by AVX on x86_64 vectorization is not possible because of how those instructions are designed. Note that the old converter is already "smarter". Note that the old converter knows to use masked loads/stores which is a closer match to the original code and which will be faster and trivially vectorizable. Also note that all x86_64 CPUs that suport gather also support masked loads and stores. I know no other CPU architecture with scatter/gather support (and doubt they do not at the same time implement masked loads/stores) Richard. > > [Richard wrote:] >> >> But then you should restrict the if-conversion >> to targets that can do gather vectorization. > > > I think adding this requirement -- to generalize, _two_ req.s: > _gather_ support for _load_ if conversion with [a] scratchpad[s], > and scatter support for _store_ if conversion with [a] scratchpad[s] -- > is a reasonable and probably wise goal. Doing so should eventually > enable us to conclude automatically "it is safe to enable this kind of > if conversion for/on this target" and enable the relevant transformations > automatically when given e.g. "-O3" and autovectorization. > > > [Abe wrote] >>> >>> As of now, the relevant "improvement" in/due-to the new >>> converter is not always effective [...] We intend to fix this. > > > [Richard wrote:] >> >> This should be fixed before the patch can go in. > > > The current plan is to add new switches that control which [if any] > if converter is active, and to have the old one _and_ the new one > be included in GCC trunk for the time being. Then I/we can have > more eyes on the new code, helping with bugs etc., without having > any impact at all on code that only either is compiled with > the old flags [which shall retain their old semantics > for now] or with no relevant flags whatsoever. > > Is that plan OK? > > > [Richard wrote:] >> >> There is the existing mechanism to if-convert only a copy >> of the loop guarded with IFN_LOOP_VECOTRIZED so the >> cost model is that of the vectorizer when it tries to vectorize >> the if-converted copy. If it doesn't vectorize that copy the >> non-if-converted loop is preserved, otherwise it is deleted. > > > I think I found the code to which you are referring here, > and I tried to use it [so as to not reduplicate efforts], > but my first effort at doing so was unsuccessful AFAIK. > Trying again -- this time with help from Jakub, > I hope -- is on my work agenda. > > > [Richard wrote:] >> >> I've argued in the past that the GIMPLE if-converter should >> always go that path (thus strictly be a vectorization enabler only). > > > If I understand correctly what you mean by that, then I agree. > > >> if-conversion is already on by default if vectorizing. > >> We only do not introduce store-data-races by default. > > Thanks. > > Do you know if the e.g. GCC 5.1 [and 5.2, if bug fix not included] > release[s] were "safe" even without your recent bug fix [July 10] > for code _not_ compiled with "-ftree-loop-if-convert-stores"? > I ask b/c the test case I wrote which was able to cause GCC to > generate crashing code [for a correct-under-low-optimization test > case] intentionally had a crash due to dereferencing a null pointer > in a _load_. IIRC, getting GCC to generate that bad code required > "-ftree-loop-if-convert-stores" despite it being a faulty _load_. > > FYI: I have made some progress on the SPEC testing, but I do not > yet have the work at a point where I`m ready to report numbers. > > > Regards, > > Abe
July/August GNU Toolchain update
Hi Guys, Sorry for the delay in bringing you this update; I have been very busy in the last few months. Anyway the highlights of the changes to the GNU Toolchain are as follows: * The GDB 7.10 branch has been created. Expect a release soon. * Support for tracepoints on aarch64-linux was added in GDBserver. * A point update of the FSF Binutils - 2.25.1 - has been released. No new features but lots of important bug fixes. * GCC 5.2 has been released. This is a bug-fix update to the previous 5.1 release. * The linker now has experimental support for the removal of redundant sections from COFF and PE format files. This is enabled via the --gc-sections linker command line option. * A new linker command line option --require-defined= has been added. This behaves in much the same way as the --undefined= option in that it creates a reference to an undefined symbol that should force a library to be pulled into the link or garbage collection not to remove a specific section. The difference between --require-defined and --undefined is that with the former the linker will issue an error message if the specified symbol has not been defined by the end of the link. * The --disassemble (or -d) and --disassemble-all (or -D) options to objdump have received a subtle change. With -d objdump will assume that any symbols present in a code section occur on the boundary between instructions and so it will refuse to disassemble across such a boundary. With -D however this assumption is suppressed. This means that it is possible for the output of -d and -D to differ if, for example, data is stored in a code section. * GCC has a couple of new warning options available: -Wframe-address This generates a warning when the __builtin_frame_address or __builtin_return_address are called with an argument greater than 0. Such calls may return indeterminate values or crash the program. -Wtautological-compare This generates a warning if a self-comparison always evaluates to true or false. This detects various mistakes such as: int i = 1; if (i > i) { ... } * With the Nios II port of GCC it is now possible to specify the target architecture variant with -march=r1 or -march=r2. It is also possible to explicitly enable or disable the use of the r2 BMX (bit manipulation) and CDX (code density) instructions via the use of the new -mbmx -mno-bmx -mcdx and -mno-cdx options. Cheers Nick
[PATCH] PR rtl-optimization/67029: gcc-5.2.0 unable to find a register to spill with O3 fsched-pressure fschedule-insns
Since ira_implicitly_set_insn_hard_regs may be called outside of ira-lives.c, it can't use the local variable, preferred_alternatives. This patch adds an alternative_mask argument to ira_implicitly_set_insn_hard_regs. OK for master and 5 branch if there are no regressions on Linux/x86-64? H.J. --- gcc/ PR rtl-optimization/67029 * ira-color.c: Include "recog.h" before including "ira-int.h". * target-globals.c: Likewise. * ira-lives.c (ira_implicitly_set_insn_hard_regs): Add an adds an alternative_mask argument and use it instead of preferred_alternatives. * ira.h (ira_implicitly_set_insn_hard_regs): Moved to ... * ira-int.h (ira_implicitly_set_insn_hard_regs): Here. * sched-deps.c: Include "ira-int.h" after including "ira.h". (sched_analyze_insn): Update call to ira_implicitly_set_insn_hard_regs. * sel-sched.c: Include "ira-int.h" after including "ira.h". (implicit_clobber_conflict_p): Update call to ira_implicitly_set_insn_hard_regs. gcc/testsuite/ PR rtl-optimization/67029 * gcc.dg/pr67029.c: New test. --- gcc/ira-color.c| 1 + gcc/ira-int.h | 2 ++ gcc/ira-lives.c| 4 ++-- gcc/ira.h | 1 - gcc/sched-deps.c | 4 +++- gcc/sel-sched.c| 4 +++- gcc/target-globals.c | 1 + gcc/testsuite/gcc.dg/pr67029.c | 14 ++ 8 files changed, 26 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr67029.c diff --git a/gcc/ira-color.c b/gcc/ira-color.c index 74d2c2e..c8f33ed 100644 --- a/gcc/ira-color.c +++ b/gcc/ira-color.c @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see #include "cfgloop.h" #include "ira.h" #include "alloc-pool.h" +#include "recog.h" #include "ira-int.h" typedef struct allocno_hard_regs *allocno_hard_regs_t; diff --git a/gcc/ira-int.h b/gcc/ira-int.h index a7c0f40..a993dfc 100644 --- a/gcc/ira-int.h +++ b/gcc/ira-int.h @@ -1041,6 +1041,8 @@ extern void ira_debug_live_ranges (void); extern void ira_create_allocno_live_ranges (void); extern void ira_compress_allocno_live_ranges (void); extern void ira_finish_allocno_live_ranges (void); +extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *, + alternative_mask); /* ira-conflicts.c */ extern void ira_debug_conflicts (bool); diff --git a/gcc/ira-lives.c b/gcc/ira-lives.c index 1cb05c2..011d513 100644 --- a/gcc/ira-lives.c +++ b/gcc/ira-lives.c @@ -831,7 +831,8 @@ single_reg_operand_class (int op_num) might be used by insn reloads because the constraints are too strict. */ void -ira_implicitly_set_insn_hard_regs (HARD_REG_SET *set) +ira_implicitly_set_insn_hard_regs (HARD_REG_SET *set, + alternative_mask preferred) { int i, c, regno = 0; enum reg_class cl; @@ -854,7 +855,6 @@ ira_implicitly_set_insn_hard_regs (HARD_REG_SET *set) mode = (GET_CODE (op) == SCRATCH ? GET_MODE (op) : PSEUDO_REGNO_MODE (regno)); cl = NO_REGS; - alternative_mask preferred = preferred_alternatives; for (; (c = *p); p += CONSTRAINT_LEN (c, p)) if (c == '#') preferred &= ~ALTERNATIVE_BIT (0); diff --git a/gcc/ira.h b/gcc/ira.h index 504b5e6..881674b 100644 --- a/gcc/ira.h +++ b/gcc/ira.h @@ -192,7 +192,6 @@ extern void ira_init (void); extern void ira_setup_eliminable_regset (void); extern rtx ira_eliminate_regs (rtx, machine_mode); extern void ira_set_pseudo_classes (bool, FILE *); -extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *); extern void ira_expand_reg_equiv (void); extern void ira_update_equiv_info_by_shuffle_insn (int, int, rtx_insn *); diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c index 3ac66e8..0a8dcb0 100644 --- a/gcc/sched-deps.c +++ b/gcc/sched-deps.c @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see #include "alloc-pool.h" #include "cselib.h" #include "ira.h" +#include "ira-int.h" #include "target.h" #ifdef INSN_SCHEDULING @@ -2891,7 +2892,8 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, rtx_insn *insn) extract_insn (insn); preprocess_constraints (insn); - ira_implicitly_set_insn_hard_regs (&temp); + alternative_mask prefrred = get_preferred_alternatives (insn); + ira_implicitly_set_insn_hard_regs (&temp, prefrred); AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs); IOR_HARD_REG_SET (implicit_reg_pending_clobbers, temp); } diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index ec2ab05..1860444 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see #include "rtlhooks-def.h" #include "emit-rtl.h" #include "ira.h" +#include "ira-int.h" #include "rtl-iter.h" #ifdef INSN_SCHEDULING @@ -2104,7 +2105,8 @@ implic
gcc-4.9-20150805 is now available
Snapshot gcc-4.9-20150805 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20150805/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch revision 226647 You'll find: gcc-4.9-20150805.tar.bz2 Complete GCC MD5=ec5f377410e6ccd5c6ed08b766d3ffad SHA1=008fd20869c46941b7e069588d5e7c5c717137ee Diffs from 4.9-20150729 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.