Re: [RFD] Simplifying subregs in LRA
> (r243782 git:856bd6f) > This is another case of multiple changes where some were not critical and > overall there is a dangerous one here I believe. The primary aim of this > change is to reload the address before reloading the inner subreg. This > appears to be a dormant bug since day1 as the original logic would have > failed when reloading an inner mem if its address was not already valid. > > The potentially fatal part of this change is the introduction of a > "return false" in the MEM subreg simplification which is placed immediately > after restoring the original subreg expression. I believe that if control > ever actually reaches this statement then LRA would infinite loop as the > MEM subreg would never be simplified. How can a change that is a no-op be fatal exactly? > So what are the next steps! > > 1) [BUG] Add an exclusion for WORD_REGISTER_OPERATIONS because MIPS is > currently broken by the existing code. PR78660 That seems the way to go, with the appropriate check on the mode sizes. > 2) [BUG] Remove the return false introduced in (r243782 git:856bd6f). !??? > 3) [CLEANUP] Remove reg_mode argument and replace all uses of reg_mode with >innermode. Rename 'reg' to 'inner' and 'operand' to 'outer' and 'mode' > to 'outermode'. > 4) [OPTIMISATION] Change double-reload logic so that it just deals with the >special outermode reload without adjusting the subreg. > 5) [??] Determine if big endian still needs a special case like in reload? >Comments anyone? I agree that a cleanup of the code would probably be in order, with an eye on the reload code as a model, but that's probably not appropriate for GCC 7. > In an attempt to make a minimal change I propose the following as it allows > WORD_REGISTER_OPERATIONS targets to benefit from the invalid address > reloading fix. I think the check would be more appropriately placed on the > outer-most if (MEM_P (reg)) but this would affect the handling of many more > subregs which seems too dangerous at this point in release. > > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c > index 393cc69..771475a 100644 > --- a/gcc/lra-constraints.c > +++ b/gcc/lra-constraints.c > @@ -1512,10 +1512,11 @@ simplify_operand_subreg (int nop, machine_mode > reg_mode) equivalences in function lra_constraints) and because for spilled > pseudos we allocate stack memory enough for the biggest >corresponding paradoxical subreg. */ > - if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > - && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > - || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > - && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg > + if (!WORD_REGISTER_OPERATIONS > + && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > + && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > + || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > + && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg) > return true; > > *curr_id->operand_loc[nop] = operand; > > > The change will affect at least arc,mips,rx,sh,sparc though I haven't > checked which of these default on for LRA just that they can turn on LRA. Only MIPS and SPARC (see https://gcc.gnu.org/backends.html). -- Eric Botcazou
Intel Phi co-processor support
Hello, Can I compile on Linux with gfortran code and to run it on Phi co-processor? Or it is better to use Intel FORTRAN compiler? Angel
Re: Intel Phi co-processor support
On Fri, Feb 03, 2017 at 02:50:37PM +0200, Angel Dimitrov wrote: > Can I compile on Linux with gfortran code and to run it on Phi > co-processor? Or it is better to use Intel FORTRAN compiler? Depends on which XeonPhi do you have. GCC doesn't support Knights Ferry or Knights Corner, does support Knights Landing. That said, for KNL I've only seen so far standalone KNL processors for which I'm not sure if offloading is possible or desirable; IMHO if KNL is the main processor in the computer, then everything is host for you and thus just using non-target OpenMP code should be sufficient, so the KNL offloading should be (mainly or solely) for the case when KNL is a coprocessor, does such thing really exist or is planned? Can somebody from Intel please clarify? Jakub
RE: [RFD] Simplifying subregs in LRA
Eric Botcazou writes: > > (r243782 git:856bd6f) > > This is another case of multiple changes where some were not critical > > and overall there is a dangerous one here I believe. The primary aim > > of this change is to reload the address before reloading the inner > > subreg. This appears to be a dormant bug since day1 as the original > > logic would have failed when reloading an inner mem if its address was > not already valid. > > > > The potentially fatal part of this change is the introduction of a > > "return false" in the MEM subreg simplification which is placed > > immediately after restoring the original subreg expression. I believe > > that if control ever actually reaches this statement then LRA would > > infinite loop as the MEM subreg would never be simplified. > > How can a change that is a no-op be fatal exactly? It's not a no-op. Any MEM_P not handled by the first "if (MEM_P(reg))" will have previously been handled by the block guarded by the following later in the function; note the "|| MEM_P (reg)": /* Force a reload of the SUBREG_REG if this is a constant or PLUS or if there may be a problem accessing OPERAND in the outer mode. */ if ((REG_P (reg) && REGNO (reg) >= FIRST_PSEUDO_REGISTER && (hard_regno = lra_get_regno_hard_regno (REGNO (reg))) >= 0 /* Don't reload paradoxical subregs because we could be looping having repeatedly final regno out of hard regs range. */ && (hard_regno_nregs[hard_regno][innermode] >= hard_regno_nregs[hard_regno][mode]) && simplify_subreg_regno (hard_regno, innermode, SUBREG_BYTE (operand), mode) < 0 /* Don't reload subreg for matching reload. It is actually valid subreg in LRA. */ && ! LRA_SUBREG_P (operand)) || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg)) { > > So what are the next steps! > > > > 1) [BUG] Add an exclusion for WORD_REGISTER_OPERATIONS because MIPS is > > currently broken by the existing code. PR78660 > > That seems the way to go, with the appropriate check on the mode sizes. I'm not sure what check to do on mode sizes. Do you think an innermode reload is only required when both modes have the same number of words? > > 2) [BUG] Remove the return false introduced in (r243782 git:856bd6f). > > !??? If a MEM subreg is neither simplified to an outermode MEM nor reloaded in innermode then I believe LRA will never resolve the subreg. Even if that is not true I'm fairly certain the addition of the code has changed behaviour and that the change is not well understood, as explained above. > > 3) [CLEANUP] Remove reg_mode argument and replace all uses of reg_mode > with > >innermode. Rename 'reg' to 'inner' and 'operand' to 'outer' and > 'mode' > > to 'outermode'. > > 4) [OPTIMISATION] Change double-reload logic so that it just deals > with the > >special outermode reload without adjusting the subreg. > > 5) [??] Determine if big endian still needs a special case like in > reload? > >Comments anyone? > > I agree that a cleanup of the code would probably be in order, with an > eye on the reload code as a model, but that's probably not appropriate > for GCC 7. Indeed, definitely want to wait for GCC 8. > > In an attempt to make a minimal change I propose the following as it > > allows WORD_REGISTER_OPERATIONS targets to benefit from the invalid > > address reloading fix. I think the check would be more appropriately > > placed on the outer-most if (MEM_P (reg)) but this would affect the > > handling of many more subregs which seems too dangerous at this point > in release. > > > > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index > > 393cc69..771475a 100644 > > --- a/gcc/lra-constraints.c > > +++ b/gcc/lra-constraints.c > > @@ -1512,10 +1512,11 @@ simplify_operand_subreg (int nop, machine_mode > > reg_mode) equivalences in function lra_constraints) and because for > > spilled pseudos we allocate stack memory enough for the biggest > > corresponding paradoxical subreg. */ > > - if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > > - && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > > - || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > > - && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg > > + if (!WORD_REGISTER_OPERATIONS > > + && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) > > + && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) > > + || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) > > + && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN > (reg) > > return true; > > > > *curr_id->operand_loc[nop] = operand; > > > > > > The change will affect at least arc,mips,rx,sh,sparc though I haven't > > checked which of these default on for LRA just that they can turn on > LRA. > > Only MIPS and SPARC (see https://gcc.gnu.org/bac
Re: Intel Phi co-processor support
2017-02-03 16:00 GMT+03:00 Jakub Jelinek : > > On Fri, Feb 03, 2017 at 02:50:37PM +0200, Angel Dimitrov wrote: > > Can I compile on Linux with gfortran code and to run it on Phi > > co-processor? Or it is better to use Intel FORTRAN compiler? > > Depends on which XeonPhi do you have. GCC doesn't support Knights Ferry > or Knights Corner, does support Knights Landing. > That said, for KNL I've only seen so far standalone KNL processors for > which I'm not sure if offloading is possible or desirable; IMHO if It is possible using so called "offload over fabric". Here is a how-to [1], which can be adapted just by replacing "icc -qopenmp" with "gcc -fopenmp", I guess. [1] https://software.intel.com/en-us/articles/how-to-use-offload-over-fabric-with-knights-landing-intel-xeon-phi-processor > KNL is the main processor in the computer, then everything is host > for you and thus just using non-target OpenMP code should be sufficient, > so the KNL offloading should be (mainly or solely) for the case when > KNL is a coprocessor, does such thing really exist or is planned? > Can somebody from Intel please clarify? > > Jakub -- Ilya
Re: [RFD] Simplifying subregs in LRA
> If a MEM subreg is neither simplified to an outermode MEM nor reloaded > in innermode then I believe LRA will never resolve the subreg. Even if that > is not true I'm fairly certain the addition of the code has changed > behaviour and that the change is not well understood, as explained above. Fair enough, let's remove the "return false" then. > I hadn't spotted that table, very helpful, thanks. The other architectures > I listed may be helped in their transition to LRA. I guess they are > attempting to move given optional LRA support. You can send me the patch(es) in advance, I'll give it a try on SPARC. -- Eric Botcazou
Re: [RFD] Simplifying subregs in LRA
On 02/01/2017 06:52 PM, Matthew Fortune wrote: Hi all, I've copied you as you have each made some significant change to a function in LRA which I guess makes you de-facto experts. I've spent a while researching the history of simplify_operand_subreg and in particular the behaviour for subregs of memory. For my sake if no-one else here is a rundown of its evolution; corrections welcome. Thanks for doing the research, Matt. minated on the next iteration (r198344 git:ea99c7a) A special case for an LRA introduced subreg was added (LRA_SUBREG_P) that should always be considered valid. This I believe is to cope with cases where a there are operands required to match but with different modes and, presumably, one of the modes is not actually allowed. Not 100% sure what this is though! As I remember x86 has such tricking x87 fp stack insn definitions with matching operands of different modes. As reload does all RTL changes at once at its work end, it was not a problem. LRA transforms RTL permanently during its work and existing RTL illegal at some point might be a problem. ... So what are the next steps! 1) [BUG] Add an exclusion for WORD_REGISTER_OPERATIONS because MIPS is currently broken by the existing code. PR78660 2) [BUG] Remove the return false introduced in (r243782 git:856bd6f). 3) [CLEANUP] Remove reg_mode argument and replace all uses of reg_mode with innermode. Rename 'reg' to 'inner' and 'operand' to 'outer' and 'mode' to 'outermode'. 4) [OPTIMISATION] Change double-reload logic so that it just deals with the special outermode reload without adjusting the subreg. 5) [??] Determine if big endian still needs a special case like in reload? Comments anyone? As Eric I prefer changes which affect minimum targets and minimum cases but still fix the bug. Almost any change in this part of LRA required some stabilization changes. It is dangerous to do a big cleanup at this development stage of GCC. Bigger cleanup could be done on stage1 after GCC7 release. I hope you and Eric will do this. Reload is a good reference point. Historically, LRA was originally written for few targets without taking corner cases of all other targets. Trying to implement all reload cases without good understanding them or their necessity would have made LRA as a project impossible. In an attempt to make a minimal change I propose the following as it allows WORD_REGISTER_OPERATIONS targets to benefit from the invalid address reloading fix. I think the check would be more appropriately placed on the outer-most if (MEM_P (reg)) but this would affect the handling of many more subregs which seems too dangerous at this point in release. diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index 393cc69..771475a 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -1512,10 +1512,11 @@ simplify_operand_subreg (int nop, machine_mode reg_mode) equivalences in function lra_constraints) and because for spilled pseudos we allocate stack memory enough for the biggest corresponding paradoxical subreg. */ - if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) - && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) - || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) - && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg + if (!WORD_REGISTER_OPERATIONS + && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode) + && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst))) + || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) + && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg) return true; *curr_id->operand_loc[nop] = operand; The change will affect at least arc,mips,rx,sh,sparc though I haven't checked which of these default on for LRA just that they can turn on LRA. I'll post this as a patch with appropriate updates to comments unless anyone raises some issues. OK. Thanks.
[RFC] [i386] Test program for ms_abi to sysv_abi function calls
This is a test program designed to test 64-bit Microsoft ABI functions that call System V functions in a multitude of permutations to attempt to discover flaws in the generation of prologues and epilogues and the optimizationsand features that can affect them, specifically shrink-wrapping, sibling calls and DRAP. This is an accompaniment to the below patch sets (and was instrumental in finding and fixingflaws!), but is not an ancestor. Use aligned SSE movs for re-aligned MS ABI pro/epilogues - https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01859.html Use out-of-line stubs for ms_abi pro/epilogues- (v3 to be posted shortly! Old v2: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02293.html) Summary === A C++ generator program (gen.cc) is built and run to generate an unguarded C header file, which is included by msabi.c. This header defines a number of ms_abi functions that call sysv_abi functions with various conditions. The generator has a CLI for test selection and currently used arguments (in msabi.exp) generates about 19k tests. The results of msabi.c and do_test.S are linked to create the test program. Build time for msabi.c is takes 200-ish seconds on an old Phenom, but execution is nearly instantaneous. Each test function is called via an assembly stub (do_test_aligned or do_test_unaligned) that: 1. Saves non-volatile registers to a global buffer, 2. Fills non-volatile registers with random data, 3. Calls the test function, 4. Saves resulting non-volatile registers for later comparison, and 5. Restores non-volatile registers to original values. Upon completion, the resulting register values are compared to the original random values to verify correctness. The return value is also verified against what is expected to validate the correctness of both the generated code as well as the test its self. What Is Tested == The test permutations consists of: A. A number of extra long parametersfor the function (0-5 being used now). B. A mask of additional non-volatile registers to explicitly clobber (aside from RDI, RSI and XMM6-15, which are always clobbered): enum optional_regs { OPTIONAL_REG_RBX = 0x01, OPTIONAL_REG_RBP = 0x02, OPTIONAL_REG_R12 = 0x04, OPTIONAL_REG_R13 = 0x08, OPTIONAL_REG_R14 = 0x10, OPTIONAL_REG_R15 = 0x20, OPTIONAL_REG_ALL = 0x3f, OPTIONAL_REG_HFP_ALL = OPTIONAL_REG_ALL & (~OPTIONAL_REG_RBP) }; B. A collection of variants: enum fn_variants { FN_VAR_MSABI = 0x01,/* This value is an implementation detail and NOT a test permutation. */ FN_VAR_HFP= 0x02, FN_VAR_REALIGN= 0x04, FN_VAR_ALLOCA = 0x08, FN_VAR_VARARGS= 0x10, FN_VAR_SIBCALL= 0x20, FN_VAR_SHRINK_WRAP= 0x40, FN_VAR_HFP_OR_REALIGN = FN_VAR_HFP | FN_VAR_REALIGN, FN_VAR_MASK = 0x7f, FN_VAR_COUNT = 7 }; The variants deserve a little more explanation. * FN_VAR_MSABI (implementation detail) Adds__attribute__((ms_abi)). * FN_VAR_HFP Adds __attribute__((optimize("no-omit-frame-pointer"))). FN_VAR_HFPand FN_VAR_REALIGNare mutually exclusive. * FN_VAR_REALIGN Adds __attribute__((__force_align_arg_pointer__)). The test will call this function twice -- with an aligned and misaligned stack. * FN_VAR_ALLOCA The ms_abi function calls alloca and passes the pointer to the sysv_abi function. * FN_VAR_VARARGS The ms_abi function takes varargs, but only passes the argptr to the sysv_abi function. * FN_VAR_SIBCALL The ms_abi function returns in a way that enables the sibling call optimization (skipped if FN_VAR_REALIGN | FN_VAR_HFP are enabled). * FN_VAR_SHRINK_WRAP Tests a global variable and uses a branch that enables the use shrink wrapping. Both the fast and slow path are tested. The following nomenclature is used for function names: (msabi|sysv)__[r|f][a][v][s][w] |||| | | | | |||| | | | Number of extra parameters (longs) |||| | | shrink wrap |||| | sibling call |||| varargs |||alloca ||Forced realignment or hard frame pointer |Explicit clobbers (hexadecimalmask) Calling Convention Examples The function msabi_25_ra2looks like this: __attribute__ ((noinline, ms_abi, __force_align_arg_pointer__)) long msabi_25_ra2 (long a, long b) { void *alloca_mem; alloca_mem = alloca (8 + a); *(long*)alloca_mem = FLAG_ALLOCA; __asm__ __volatile__ ("" :::"rbx", "r12", "r15"); return sysv_a2_noinfo (alloca_mem, a, b); } And the tests (both aligned and misaligned) looks something like this: void init_test (void *fn, const char *name, enum alignment_option alignment, enum shrink_wrap_option shrink_wrap, long ret_expected); void do_tests () { long ret; long a = 1; lo