Re: Dodji Seketeli appointed diagnostics framework maintainer
Gerald Pfeifer a écrit: > I am happy to announce Dodji Seketeli as diagnostics framework > maintainer. > > Thanks for your contributions and agreeing to fill this role, Dodji! Thank you! > And thanks to Gaby for his contributions in this area over the years > and the professional manner he has been initiating this transition > (including some final contributions and nominating Dodji). Indeed. > PS: Please update MAINTAINERS accordingly. Done. Gabriel Dos Reis a écrit: > Congratulations, Dodji! Thank you for considering me for this position. > Have a lot of fun :-) I will try :-) Cheers. -- Dodji
Re: Reusing OpenMP front end infrastructure for OpenACC -- how?
On 09/05/13 16:02, Jakub Jelinek wrote: I think there is no point in renaming the existing stuff, we use it for some Cilk+ stuff too these days, renaming could only complicate maintainance, making it harder to backport OpenMP bugfixes to older release branches etc. IMHO just use from the OpenMP parsing, trees and gimple stuff whatever is usable for OpenACC too, and just for stuff that doesn't have counterparts add oacc/OACC stuff. Yeah, I agree. Names are sticky. nathan
Re: Questions about LTO infrastructure and pragma omp target
On 17 Sep 14:12, Jakub Jelinek wrote: > On Tue, Sep 17, 2013 at 01:56:39PM +0200, Richard Biener wrote: > > > > Are you sure we have the same IL for all targets and the same targets > > for all functions? That would certainly simplify things, but you still need > > a way to tell the target compiler which symbol to emit the function on > > as the compile-stage will already necessarily refer to all target > > variant symbols. > > This has been discussed to some extent during Cauldron. > Yes, there are various target dependencies in the GIMPLE IL, many of them > very early. > Some of the dependencies are there already during preprocessing, there is > nothing to do about those. > For some things we will just rely on the host and target having the same > properties, stuff like BITS_PER_UNIT, type layout/alignment, endianity, > the OpenMP (and I believe OpenACC too) model effectively requires that, > while you don't need to have shared address space between host and target > (but can have that), for the mapping/unmapping it is assumed that you can > simply take host portions of memory and copy them over to the target device > or back, as sequence of bytes, there is no form of RPC or similar that would > tweak endianity, differently sized types, padding, etc. > While you can say have 64-bit host and 32-bit target or vice versa, the > target IL will simply contain precision info, alignment, structure layout > etc. and just will have to generate right code for that (something that is > native long on the host can be native long long on the target or vice versa > etc.). > Then there are dependencies we'd ideally get rid of, at least pre-IPA, > stuff like BRANCH_COST, but generally that is just an optimization issue and > thus not that big deal. > Bigger issue are target specific builtins, I guess we'll either have to just > sorry on those, or have some helper targhook that will translate a subset of > md builtins from selected hosts to selected targets. > Preferrably, before IPA we'd introduce as few target dependencies into the > IL as possible, and gradually towards RTL can add more dependencies (e.g. > the vectorizer adds so many target dependencies that at that point trying to > use the IL for a different target is practically impossible). > > Jakub Do I understand correctly that GIMPLE IL is target dependent, but we will emit the same IL for all targets? -- Ilya
Re: Questions about LTO infrastructure and pragma omp target
On Thu, Sep 19, 2013 at 02:44:30PM +0400, Ilya Verbin wrote: > Do I understand correctly that GIMPLE IL is target dependent, but we will emit > the same IL for all targets? Yes. Some of the target dependencies are required to be inherited from the host, some can be tolerated (optimization decisions), others can be errored out (md builtins). Jakub
Re: 990208-1.c / our backend
On Thu, Sep 19, 2013 at 3:03 PM, Hendrik Greving wrote: > Hi, > > I have a GCC regression test failing for our backend for -O3. I am > posting its code below. This might be more of a C-standard question, > but is the optimization case guaranteed not to fail from a C > perspective? When compiling it with our backend, the 'here' labels > actually match. > > > /* As a quality of implementation issue, we should not prevent inlining >of function explicitly marked inline just because a label therein had >its address taken. */ > > #ifndef NO_LABEL_VALUES > static void *ptr1, *ptr2; > static int i = 1; > > static __inline__ void doit(void **pptr, int cond) > { > if (cond) { > here: > *pptr = &&here; > } > } > > static void f(int cond) > { > doit (&ptr1, cond); > } > > static void g(int cond) > { > doit (&ptr2, cond); > } > > static void bar(void); > > int main() > { > f (i); > bar(); > g (i); > > #ifdef __OPTIMIZE__ > if (ptr1 == ptr2) > abort (); > #endif > > exit (0); > } > > void bar(void) { } > > #else /* NO_LABEL_VALUES */ > int main() { exit(0); } > #endif It also failed on trunk with -O2/-O3 on x86. But 990208-1.c has been changed to __attribute__ ((noinline)) static void f(int cond) { doit (&ptr1, cond); } __attribute__ ((noinline)) static void g(int cond) { doit (&ptr2, cond); } __attribute__ ((noinline)) static void bar(void); -- H.J.
Re: 990208-1.c / our backend
Thanks! On Thu, Sep 19, 2013 at 3:38 PM, H.J. Lu wrote: > On Thu, Sep 19, 2013 at 3:03 PM, Hendrik Greving > wrote: >> Hi, >> >> I have a GCC regression test failing for our backend for -O3. I am >> posting its code below. This might be more of a C-standard question, >> but is the optimization case guaranteed not to fail from a C >> perspective? When compiling it with our backend, the 'here' labels >> actually match. >> >> >> /* As a quality of implementation issue, we should not prevent inlining >>of function explicitly marked inline just because a label therein had >>its address taken. */ >> >> #ifndef NO_LABEL_VALUES >> static void *ptr1, *ptr2; >> static int i = 1; >> >> static __inline__ void doit(void **pptr, int cond) >> { >> if (cond) { >> here: >> *pptr = &&here; >> } >> } >> >> static void f(int cond) >> { >> doit (&ptr1, cond); >> } >> >> static void g(int cond) >> { >> doit (&ptr2, cond); >> } >> >> static void bar(void); >> >> int main() >> { >> f (i); >> bar(); >> g (i); >> >> #ifdef __OPTIMIZE__ >> if (ptr1 == ptr2) >> abort (); >> #endif >> >> exit (0); >> } >> >> void bar(void) { } >> >> #else /* NO_LABEL_VALUES */ >> int main() { exit(0); } >> #endif > > It also failed on trunk with -O2/-O3 on x86. But 990208-1.c has > been changed to > > __attribute__ ((noinline)) > static void f(int cond) > { > doit (&ptr1, cond); > } > > __attribute__ ((noinline)) > static void g(int cond) > { > doit (&ptr2, cond); > } > > __attribute__ ((noinline)) > static void bar(void); > > > -- > H.J.
gcc-4.8-20130919 is now available
Snapshot gcc-4.8-20130919 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20130919/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch revision 202760 You'll find: gcc-4.8-20130919.tar.bz2 Complete GCC MD5=35c43ec000f3a330dd3ba143a8ddf8bb SHA1=8e2aa721eedb3113f5e250799241b4dd84ff7dec Diffs from 4.8-20130912 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
990208-1.c / our backend
Hi, I have a GCC regression test failing for our backend for -O3. I am posting its code below. This might be more of a C-standard question, but is the optimization case guaranteed not to fail from a C perspective? When compiling it with our backend, the 'here' labels actually match. /* As a quality of implementation issue, we should not prevent inlining of function explicitly marked inline just because a label therein had its address taken. */ #ifndef NO_LABEL_VALUES static void *ptr1, *ptr2; static int i = 1; static __inline__ void doit(void **pptr, int cond) { if (cond) { here: *pptr = &&here; } } static void f(int cond) { doit (&ptr1, cond); } static void g(int cond) { doit (&ptr2, cond); } static void bar(void); int main() { f (i); bar(); g (i); #ifdef __OPTIMIZE__ if (ptr1 == ptr2) abort (); #endif exit (0); } void bar(void) { } #else /* NO_LABEL_VALUES */ int main() { exit(0); } #endif
Question about clobbering registers in prologue/epilogue code
I was wondering if someone could help me find the right GCC hooks to implement some changes in the prologue and epilogue code for the MIPS target. What I am trying to do is to have a flag (call it -mfp64-compat) that will allow me to generate code in a routine that will use the MIPS floating point unit in fp1 mode (-mfp64, 64 bit floating point registers) but in a way that is compatible with being called from a routine in fp0 mode (-mfp32, 32 bit floating point registers). The idea is that the routine is called with the FPU in fp0 mode, then we save all the floating point registers, because switching modes leaves the fp registers in an unknown state, and then switch to fp1 mode. At the return we save the f12 return register (if needed), switch back to fp0 mode and then restore f12 and the other floating point registers before the return. I'll leave out the complication of making calls to other routines from this routine for now. I can set TARGET_FLOAT64 at the beginning of the function to generate fp1 code, but my attempt to create an instruction that switches mode and clobbers all the floating point registers and calling this from expand_prologue does not seem to be working. I generate the instruction (currently just a nop stub) but it does not clobber the floating point registers. I think this is because expand_prologue is getting called too late in the rtl expansion/code generation stream. So I am wondering where I should add this instruction into the rtl stream? Do I need to create a new rtl pass I could run immediately after the trees are expanded into rtl? I have attached a GCC patch file that I have been experimenting with so far. If I compile a routine with -mfp64-compat I get my nop generated by expand_prologue but I do not get the save/restore of the floating point registers that I was hoping for. If I call __builtin_mips_switch_fp_mode(0) explicitly then I do see the fp registers get saved and restored. Steve Ellcey sell...@mips.com diff --git a/gcc/config/mips/mips-ftypes.def b/gcc/config/mips/mips-ftypes.def index 1268c53..7ccbca5 100644 --- a/gcc/config/mips/mips-ftypes.def +++ b/gcc/config/mips/mips-ftypes.def @@ -124,3 +124,4 @@ DEF_MIPS_FTYPE (2, (VOID, SI, CVPOINTER)) DEF_MIPS_FTYPE (2, (VOID, SI, SI)) DEF_MIPS_FTYPE (2, (VOID, V2HI, V2HI)) DEF_MIPS_FTYPE (2, (VOID, V4QI, V4QI)) +DEF_MIPS_FTYPE (1, (VOID, SI)) diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 5993aab..573c136 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -682,6 +682,7 @@ static const struct attribute_spec mips_attribute_table[] = { { "micromips", 0, 0, true, false, false, NULL, false }, { "nomicromips", 0, 0, true, false, false, NULL, false }, { "nocompression", 0, 0, true, false, false, NULL, false }, + { "fp64_compat", 0, 0, true, false, false, NULL, false }, /* Allow functions to be specified as interrupt handlers */ { "interrupt", 0, 0, false, true, true, NULL, false }, { "use_shadow_register_set", 0, 0, false, true, true, NULL, false }, @@ -11224,6 +11225,12 @@ mips_expand_prologue (void) the call to mcount. */ if (crtl->profile) emit_insn (gen_blockage ()); + + if (TARGET_FP64_COMPAT) +{ + emit_insn (gen_mips_switch_fp_mode (GEN_INT (0))); + emit_insn (gen_blockage ()); +} } /* Attach all pending register saves to the previous instruction. @@ -13620,6 +13627,7 @@ AVAIL_NON_MIPS16 (dsp_64, TARGET_64BIT && TARGET_DSP) AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && TARGET_DSPR2) AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_VECTORS) AVAIL_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN) +AVAIL_NON_MIPS16 (fp, TARGET_FP64_COMPAT) /* Construct a mips_builtin_description from the given arguments. @@ -14059,7 +14067,9 @@ static const struct mips_builtin_description mips_builtins[] = { LOONGSON_BUILTIN_SUFFIX (punpcklwd, s, MIPS_V2SI_FTYPE_V2SI_V2SI), /* Sundry other built-in functions. */ - DIRECT_NO_TARGET_BUILTIN (cache, MIPS_VOID_FTYPE_SI_CVPOINTER, cache) + DIRECT_NO_TARGET_BUILTIN (cache, MIPS_VOID_FTYPE_SI_CVPOINTER, cache), + + DIRECT_NO_TARGET_BUILTIN (switch_fp_mode, MIPS_VOID_FTYPE_SI, fp) }; /* Index I is the function declaration for mips_builtins[I], or null if the @@ -16634,6 +16644,9 @@ static void mips_set_current_function (tree fndecl) { mips_set_compression_mode (mips_get_compress_mode (fndecl)); + if (fndecl \ + && lookup_attribute ("fp64_compat", DECL_ATTRIBUTES (fndecl)) != NULL) +target_flags |= MASK_FP64_COMPAT; } /* Allocate a chunk of memory for per-function machine-dependent data. */ @@ -16761,6 +16774,12 @@ mips_option_override (void) target_flags_explicit |= MASK_SOFT_FLOAT_ABI; } + if (TARGET_FP64_COMPAT) +{ + target_flags |= MASK_FLOAT64; + target_flags_explicit |= MASK_FLOAT64; +} + if (TARGET_FLIP_MIPS16) TARGET_INTERLINK_COMPRESSED = 1; diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mip
Re: Question about clobbering registers in prologue/epilogue code
On 09/19/2013 04:06 PM, Steve Ellcey wrote: I was wondering if someone could help me find the right GCC hooks to implement some changes in the prologue and epilogue code for the MIPS target. What I am trying to do is to have a flag (call it -mfp64-compat) that will allow me to generate code in a routine that will use the MIPS floating point unit in fp1 mode (-mfp64, 64 bit floating point registers) but in a way that is compatible with being called from a routine in fp0 mode (-mfp32, 32 bit floating point registers). The idea is that the routine is called with the FPU in fp0 mode, then we save all the floating point registers, because switching modes leaves the fp registers in an unknown state, and then switch to fp1 mode. At the return we save the f12 return register (if needed), switch back to fp0 mode and then restore f12 and the other floating point registers before the return. I'll leave out the complication of making calls to other routines from this routine for now. I can set TARGET_FLOAT64 at the beginning of the function to generate fp1 code, but my attempt to create an instruction that switches mode and clobbers all the floating point registers and calling this from expand_prologue does not seem to be working. I generate the instruction (currently just a nop stub) but it does not clobber the floating point registers. I think this is because expand_prologue is getting called too late in the rtl expansion/code generation stream. So I am wondering where I should add this instruction into the rtl stream? Do I need to create a new rtl pass I could run immediately after the trees are expanded into rtl? I have attached a GCC patch file that I have been experimenting with so far. If I compile a routine with -mfp64-compat I get my nop generated by expand_prologue but I do not get the save/restore of the floating point registers that I was hoping for. If I call __builtin_mips_switch_fp_mode(0) explicitly then I do see the fp registers get saved and restored. Can you open code those saves/restores yourself in mips_expand_prologue/mips_expand_epilogue? At least for a first cut that might be easiest. The register allocator isn't ever going to see the prologue/epilogue insns, so if you're depending on it to set regs_ever_live and friends to trigger the saves/restores, then it's not going to work. jeff
is there a optimizing opportunity for const std::vector + std::initializer_list replaced with std::array?
gcc 4.8.1, -O3 -march=native -std=c++11 small example program to check what does the gcc 4.8.1 optimizer do with const std::vector/std::arrays + simple operations --- #include #include #include #define USE_ARRAY #if defined(USE_ARRAY) static int calc(const std::array p_ints, const int& p_init) #else static int calc(const std::vector p_ints, const int& p_init) #endif { return std::accumulate(p_ints.begin(), p_ints.end(), p_init); } int main() { const int result = calc({10,20,30},100); return result; } --- gcc produces this code if USE_ARRAY is defined main: moveax, 160 ret if USE_ARRAY is undefined (and vector is in use) it produces main: pushrbx movedi, 12 calloperator new(unsigned long) movrdx, QWORD PTR ._81[rip] movrdi, rax movQWORD PTR [rax], rdx moveax, DWORD PTR ._81[rip+8] movrsi, rdx shrrsi, 32 leaebx, [rsi+100+rdx] addebx, eax testrdi, rdi movDWORD PTR [rdi+8], eax je.L2 calloperator delete(void*) .L2: moveax, ebx poprbx ret ._81: .long10 .long20 .long30 so my questions is - can gcc replace/subtitute the const std::vector by const std::array in such const situations, to get better optimizer results or is the STL itself responsible for beeing optimizeable like that - or does that brake any standard definitions? btw: clang 3.3 produces much more code for both cases - nearly equal using array/vector (except new/delete) main: # @main movabsq$85899345930, %rax # imm = 0x14000A movq%rax, -16(%rsp) movl$100, %esi movl$30, -8(%rsp) xorl%edx, %edx leaq-16(%rsp), %rcx movb$1, %al testb%al, %al jne.LBB0_1 movd%esi, %xmm1 pxor%xmm0, %xmm0 xorl%eax, %eax .LBB0_3:# %vector.body.i.i movdqu(%rsp,%rax,4), %xmm2 paddd%xmm2, %xmm0 movdqu-16(%rsp,%rax,4), %xmm2 paddd%xmm2, %xmm1 addq$8, %rax cmpq%rax, %rdx jne.LBB0_3 jmp.LBB0_4 .LBB0_1: pxor%xmm0, %xmm0 movd%esi, %xmm1 .LBB0_4:# %middle.block.i.i movl$3, %esi paddd%xmm1, %xmm0 movdqa%xmm0, %xmm1 movhlps%xmm1, %xmm1# xmm1 = xmm1[1,1] paddd%xmm0, %xmm1 phaddd%xmm1, %xmm1 movd%xmm1, %eax cmpq%rdx, %rsi je.LBB0_7 addq$-12, %rcx leaq-16(%rsp), %rdx .LBB0_6:# %scalar.ph.i.i addl12(%rcx), %eax addq$4, %rcx cmpq%rcx, %rdx jne.LBB0_6 .LBB0_7:# %_ZL4calcSt5arrayIiLm3EERKi.exit ret
Re: is there a optimizing opportunity for const std::vector + std::initializer_list replaced with std::array?
(gcc-h...@gcc.gnu.org would have been a better list) On Fri, 20 Sep 2013, Dennis Luehring wrote: gcc 4.8.1, -O3 -march=native -std=c++11 small example program to check what does the gcc 4.8.1 optimizer do with const std::vector/std::arrays + simple operations --- #include #include #include #define USE_ARRAY #if defined(USE_ARRAY) static int calc(const std::array p_ints, const int& p_init) #else static int calc(const std::vector p_ints, const int& p_init) #endif { return std::accumulate(p_ints.begin(), p_ints.end(), p_init); } int main() { const int result = calc({10,20,30},100); return result; } --- gcc produces this code if USE_ARRAY is defined main: moveax, 160 ret if USE_ARRAY is undefined (and vector is in use) it produces [long expected code] so my questions is - can gcc replace/subtitute the const std::vector by const std::array in such const situations, to get better optimizer results or is the STL itself responsible for beeing optimizeable like that - or does that brake any standard definitions? We don't perform such high-level optimizations. But if you expand, inline and simplify this program, the optimizers sees something like: p=operator new(12); memcpy(p,M,12); // M contains {10, 20, 30} res=100+p[0]+p[1]+p[2]; if(p!=0) operator delete(p); A few things that go wrong: * because p is filled with memcpy and not with regular assignments, the compiler doesn't realize that p[0] is known. * the test p != 0 is unnecessary (a patch that should help is pending review) * we would then be left with: p=new(12); delete p; return 160; gcc knows how to remove free(malloc(12)) but not the C++ variant (I don't even know if it is legal, or what conditions and flags are required to make it so). Please go to the gcc bugzilla and file an enhancement request (category tree-optimization) if these problems are not there yet. -- Marc Glisse
Re: Question about clobbering registers in prologue/epilogue code
Steve Ellcey writes: > I was wondering if someone could help me find the right GCC hooks to > implement some changes in the prologue and epilogue code for the MIPS > target. What I am trying to do is to have a flag (call it > -mfp64-compat) that will allow me to generate code in a routine that > will use the MIPS floating point unit in fp1 mode (-mfp64, 64 bit > floating point registers) but in a way that is compatible with being > called from a routine in fp0 mode (-mfp32, 32 bit floating point > registers). > > The idea is that the routine is called with the FPU in fp0 mode, then we > save all the floating point registers, because switching modes leaves > the fp registers in an unknown state, and then switch to fp1 mode. At > the return we save the f12 return register (if needed), switch back to > fp0 mode and then restore f12 and the other floating point registers > before the return. I'll leave out the complication of making calls to > other routines from this routine for now. > > I can set TARGET_FLOAT64 at the beginning of the function to generate > fp1 code, but my attempt to create an instruction that switches mode and > clobbers all the floating point registers and calling this from > expand_prologue does not seem to be working. I generate the instruction > (currently just a nop stub) but it does not clobber the floating point > registers. I think this is because expand_prologue is getting called > too late in the rtl expansion/code generation stream. So I am wondering > where I should add this instruction into the rtl stream? Do I need to > create a new rtl pass I could run immediately after the trees are > expanded into rtl? > > I have attached a GCC patch file that I have been experimenting with > so far. If I compile a routine with -mfp64-compat I get my nop > generated by expand_prologue but I do not get the save/restore of the > floating point registers that I was hoping for. If I call > __builtin_mips_switch_fp_mode(0) explicitly then I do see the fp > registers get saved and restored. Well, it's really backend code that works out what registers need to be saved and restored, although it's based on generic information. So it should just be a case of making mips_save_reg_p return true for all FP registers in an fp64_compat function. The prologue would then need to restore any incoming float arguments after the mode switch. For something like this you'd also need to define mips_epilogue_uses to return true for all float registers, so that the none of the epilogue restores get deleted as dead later. This seems pretty expensive though. Just wondering: is it related to the -mfp64 jmp_buf thing? Thanks, Richard