4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.
Hi, This is regarding gcc.c-torture/execute/vla-dealloc-1.c failure. 4.7.0 ia64-hp-hpux: program timed out (time out 300 seconds). 4.7.0 ia64-redhat-linux: program timed out (time out 300 seconds). 4.7.0 x86_64-suse-linux: execution completes successfully. Inserting a printf statement in the loop path makes the executable to complete executing without any issues. 4.6.3 ia64-hp-hpux: execution completes successfully. So it looks like a regression in 4.7.0. Any suggestion, which fix (check in) between 4.6.3 and 4.7.0 could have caused this failure? gcc.c-torture/execute/vla-dealloc-1.c #if (__SIZEOF_INT__ <= 2) #define LIMIT 1 #else #define LIMIT 100 #endif void *volatile p; int main (void) { int n = 0; if (0) { lab:; } int x[n % 1000 + 1]; x[0] = 1; x[n % 1000] = 2; p = x; n++; if (n < LIMIT) goto lab; return 0; } Regards, Kannan
RE: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.
Hi, Similarly for the following two test case which deals with VLA de-allocation in a branch back situation: 1. gcc.c-torture/execute/pr43220.c 2. gcc.c-torture/execute/20040811-1.c Regards, Kannan -Original Message- From: Mailaripillai, Kannan Jeganathan Sent: Thursday, May 03, 2012 4:48 PM To: 'gcc@gcc.gnu.org' Subject: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure. Hi, This is regarding gcc.c-torture/execute/vla-dealloc-1.c failure. 4.7.0 ia64-hp-hpux: program timed out (time out 300 seconds). 4.7.0 ia64-redhat-linux: program timed out (time out 300 seconds). 4.7.0 x86_64-suse-linux: execution completes successfully. Inserting a printf statement in the loop path makes the executable to complete executing without any issues. 4.6.3 ia64-hp-hpux: execution completes successfully. So it looks like a regression in 4.7.0. Any suggestion, which fix (check in) between 4.6.3 and 4.7.0 could have caused this failure? gcc.c-torture/execute/vla-dealloc-1.c #if (__SIZEOF_INT__ <= 2) #define LIMIT 1 #else #define LIMIT 100 #endif void *volatile p; int main (void) { int n = 0; if (0) { lab:; } int x[n % 1000 + 1]; x[0] = 1; x[n % 1000] = 2; p = x; n++; if (n < LIMIT) goto lab; return 0; } Regards, Kannan
Re: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.
On Thu, May 3, 2012 at 1:33 PM, Mailaripillai, Kannan Jeganathan wrote: > Hi, > > Similarly for the following two test case which deals with VLA > de-allocation in a branch back situation: > 1. gcc.c-torture/execute/pr43220.c > 2. gcc.c-torture/execute/20040811-1.c Can you try --param large-stack-frame=1? Is sizeof (int) <= 2? > Regards, > Kannan > > -Original Message- > From: Mailaripillai, Kannan Jeganathan > Sent: Thursday, May 03, 2012 4:48 PM > To: 'gcc@gcc.gnu.org' > Subject: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure. > > Hi, > > This is regarding gcc.c-torture/execute/vla-dealloc-1.c failure. > > 4.7.0 ia64-hp-hpux: program timed out (time out 300 seconds). > 4.7.0 ia64-redhat-linux: program timed out (time out 300 seconds). > 4.7.0 x86_64-suse-linux: execution completes successfully. > > Inserting a printf statement in the loop path makes the executable to > complete executing without any issues. > > 4.6.3 ia64-hp-hpux: execution completes successfully. > > So it looks like a regression in 4.7.0. Any suggestion, which fix (check in) > between 4.6.3 and 4.7.0 could have caused this failure? > > gcc.c-torture/execute/vla-dealloc-1.c > > #if (__SIZEOF_INT__ <= 2) > #define LIMIT 1 > #else > #define LIMIT 100 > #endif > > void *volatile p; > > int > main (void) > { > int n = 0; > if (0) > { > lab:; > } > int x[n % 1000 + 1]; > x[0] = 1; > x[n % 1000] = 2; > p = x; > n++; > if (n < LIMIT) > goto lab; > return 0; > } > > Regards, > Kannan >
Re: Porting new target architecture to GCC
Thank you all very much for your comments and advice! It certainly has helped me to gain a better perspective on porting GCC to a new architecture. It's not directly a suggestion from my "Betreuer"; a few suggested it as a Praktikum or as a masters thesis, and I wanted to look into it and see what was possible. Ben
RE: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.
> Can you try --param large-stack-frame=1? No change. The program did not return/exit for more than 12 minutes. > Is sizeof (int) <= 2? No. Regards, Kannan -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Thursday, May 03, 2012 5:14 PM To: Mailaripillai, Kannan Jeganathan Cc: gcc@gcc.gnu.org Subject: Re: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure. On Thu, May 3, 2012 at 1:33 PM, Mailaripillai, Kannan Jeganathan wrote: > Hi, > > Similarly for the following two test case which deals with VLA > de-allocation in a branch back situation: > 1. gcc.c-torture/execute/pr43220.c > 2. gcc.c-torture/execute/20040811-1.c Can you try --param large-stack-frame=1? Is sizeof (int) <= 2? > Regards, > Kannan > > -Original Message- > From: Mailaripillai, Kannan Jeganathan > Sent: Thursday, May 03, 2012 4:48 PM > To: 'gcc@gcc.gnu.org' > Subject: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure. > > Hi, > > This is regarding gcc.c-torture/execute/vla-dealloc-1.c failure. > > 4.7.0 ia64-hp-hpux: program timed out (time out 300 seconds). > 4.7.0 ia64-redhat-linux: program timed out (time out 300 seconds). > 4.7.0 x86_64-suse-linux: execution completes successfully. > > Inserting a printf statement in the loop path makes the executable to > complete executing without any issues. > > 4.6.3 ia64-hp-hpux: execution completes successfully. > > So it looks like a regression in 4.7.0. Any suggestion, which fix (check in) > between 4.6.3 and 4.7.0 could have caused this failure? > > gcc.c-torture/execute/vla-dealloc-1.c > > #if (__SIZEOF_INT__ <= 2) > #define LIMIT 1 > #else > #define LIMIT 100 > #endif > > void *volatile p; > > int > main (void) > { > int n = 0; > if (0) > { > lab:; > } > int x[n % 1000 + 1]; > x[0] = 1; > x[n % 1000] = 2; > p = x; > n++; > if (n < LIMIT) > goto lab; > return 0; > } > > Regards, > Kannan >
Re: Porting new target architecture to GCC
On 02/05/12 16:36, Ian Lance Taylor wrote: I don't know what a bachelor thesis is, so I don't know if this would be suitable. A GCC port by itself would be too simple for a masters thesis in the U.S. Yes, on second thought, the Betreuer I mentioned did not say anything about a masters thesis; I just recall seeing that someone at some university had done a port for his masters thesis (albeit with GCC 2.9).
Re: Paradoxical subreg reload issue
02/05/2012 21:36, Eric Botcazou : >> I have an issue (gcc 4.6.3, private bacakend) when reloading operands of >> this insn: >> (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0) >> (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f])) >> >> The register 21 is reloaded into >> (reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register. >> Since it is a BIG_ENDIAN target, the SI subreg regno is then -1. >> >> Note that word_mode is SImode, whereas the class r0 belongs to is >> HI-wide. I don't know if this matters when reloading. >> >> I have no idea how to debug this, if it is a backend or a reload bug. > > RA/reload is known to have issues with word-mode paradoxical subregs on > big-endian machines. For example, on SPARC 64-bit, we run into similar > problems for FP regs, which are 32-bit. Likewise on HP-PA 64-bit I think. > > So we have kludges in the back-end: > > /* Defines invalid mode changes. Borrowed from the PA port. > >SImode loads to floating-point registers are not zero-extended. >The definition for LOAD_EXTEND_OP specifies that integer loads >narrower than BITS_PER_WORD will be zero-extended. As a result, >we inhibit changes from SImode unless they are to a mode that is >identical in size. > >Likewise for SFmode, since word-mode paradoxical subregs are >problematic on big-endian architectures. */ > > #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ > (TARGET_ARCH64 \ >&& GET_MODE_SIZE (FROM) == 4 \ >&& GET_MODE_SIZE (TO) != 4 \ >? reg_classes_intersect_p (CLASS, FP_REGS) : 0) > I modified CANNOT_CHANGE_MODE_CLASS as you suggested. But strange as it may seem, it has no effect on such a reload, and I can't find a way to make it work... BTW, has this bug already been filed? Do you have an idea how deep is this bug in the reload, how complex it is and which part in the reload it is related to? Thanks, Aurélien
Re: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.
On Thu, May 3, 2012 at 2:02 PM, Mailaripillai, Kannan Jeganathan wrote: > >> Can you try --param large-stack-frame=1? > No change. The program did not return/exit for more than 12 minutes. > >> Is sizeof (int) <= 2? > No. I suppose you open a bugreport then (or check if one is already present for these failures). I have no idea what could have caused them, but ia64 is basically unmaintained. Richard. > Regards, > Kannan > > -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Thursday, May 03, 2012 5:14 PM > To: Mailaripillai, Kannan Jeganathan > Cc: gcc@gcc.gnu.org > Subject: Re: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure. > > On Thu, May 3, 2012 at 1:33 PM, Mailaripillai, Kannan Jeganathan > wrote: >> Hi, >> >> Similarly for the following two test case which deals with VLA >> de-allocation in a branch back situation: >> 1. gcc.c-torture/execute/pr43220.c >> 2. gcc.c-torture/execute/20040811-1.c > > Can you try --param large-stack-frame=1? Is sizeof (int) <= 2? > >> Regards, >> Kannan >> >> -Original Message- >> From: Mailaripillai, Kannan Jeganathan >> Sent: Thursday, May 03, 2012 4:48 PM >> To: 'gcc@gcc.gnu.org' >> Subject: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure. >> >> Hi, >> >> This is regarding gcc.c-torture/execute/vla-dealloc-1.c failure. >> >> 4.7.0 ia64-hp-hpux: program timed out (time out 300 seconds). >> 4.7.0 ia64-redhat-linux: program timed out (time out 300 seconds). >> 4.7.0 x86_64-suse-linux: execution completes successfully. >> >> Inserting a printf statement in the loop path makes the executable to >> complete executing without any issues. >> >> 4.6.3 ia64-hp-hpux: execution completes successfully. >> >> So it looks like a regression in 4.7.0. Any suggestion, which fix (check in) >> between 4.6.3 and 4.7.0 could have caused this failure? >> >> gcc.c-torture/execute/vla-dealloc-1.c >> >> #if (__SIZEOF_INT__ <= 2) >> #define LIMIT 1 >> #else >> #define LIMIT 100 >> #endif >> >> void *volatile p; >> >> int >> main (void) >> { >> int n = 0; >> if (0) >> { >> lab:; >> } >> int x[n % 1000 + 1]; >> x[0] = 1; >> x[n % 1000] = 2; >> p = x; >> n++; >> if (n < LIMIT) >> goto lab; >> return 0; >> } >> >> Regards, >> Kannan >>
-Os seems to remove a variable necessary for my transformation
I have been fighting with a simple problem for the past few nights. In my GIMPLE pass I create a variable, an ssa name for that variable, and then assign it to the LHS of a call I build: call = gimple_build_call(fndecl, 0); decl = create_tmp_var(TREE_TYPE(TREE_TYPE(test_decode_fndecl)), "FOO"); ssa = make_ssa_name(decl, call); gimple_set_lhs(call, ssa); gsi_insert_before(&gsi, call, GSI_SAME_STMT); I verify the GIMPLE and it looks proper: char * FOO.2; /* some gimple code */ FOO.2_8 = fn(); strcat(&mylocal, FOO.2_8); However, when I build with -Os the GIMPLE looks like: char * FOO.2; /* some gimple code */ fn(); strcat(&mylocal, FOO.2_8); And the compiler segfaults in "ptr_deref_may_alias_decl_p" for 'ptr' FOO.2_8 and 'decl' mylocal arguments. The segfault occurs because TREE_TYPE(ptr) is NULL and calling POINTER_TYPE_P(TREE_TYPE(ptr)) dereferences a NULL. After more debugging, it is eliminate_unnecessary_stmts() which tries to remove the 'FOO.2_8 = fn();' statement. In that routine, the LHS is considered "dead" and because of that the routine sets the LHS to NULL. So, I guess my question is, how can I force this stmt to hang around? I looked at eliminate_unnecessary_stmts, and do not see any specific flags I can set to the stmt to make 'em hang around, and I do not know what to do to make LHS appear not "dead." Even if I set 'ssa' TREE_USED and 'decl' as DECL_PRESERVE_P. Thanks for any information! -Matt
Re: -Os seems to remove a variable necessary for my transformation
On Thu, May 3, 2012 at 3:30 PM, Matt Davis wrote: > I have been fighting with a simple problem for the past few nights. In my > GIMPLE pass I create a variable, an ssa name for that variable, and then > assign > it to the LHS of a call I build: > > call = gimple_build_call(fndecl, 0); > decl = create_tmp_var(TREE_TYPE(TREE_TYPE(test_decode_fndecl)), "FOO"); > ssa = make_ssa_name(decl, call); > gimple_set_lhs(call, ssa); > gsi_insert_before(&gsi, call, GSI_SAME_STMT); > > I verify the GIMPLE and it looks proper: > > char * FOO.2; > /* some gimple code */ > FOO.2_8 = fn(); > strcat(&mylocal, FOO.2_8); Probably for the use stmt you did not call update_stmt. You can check in the debugger by debug_immediate_uses () which likely prints no use for FOO.2_8 > > However, when I build with -Os the GIMPLE looks like: > char * FOO.2; > /* some gimple code */ > fn(); > strcat(&mylocal, FOO.2_8); > > And the compiler segfaults in "ptr_deref_may_alias_decl_p" for 'ptr' FOO.2_8 > and > 'decl' mylocal arguments. The segfault occurs because TREE_TYPE(ptr) is NULL > and calling POINTER_TYPE_P(TREE_TYPE(ptr)) dereferences a NULL. That's what happens when an SSA name is released. > After more debugging, it is eliminate_unnecessary_stmts() which tries to > remove > the 'FOO.2_8 = fn();' statement. In that routine, the LHS is considered > "dead" > and because of that the routine sets the LHS to NULL. It's dead if it has no uses. > So, I guess my question is, how can I force this stmt to hang around? I > looked > at eliminate_unnecessary_stmts, and do not see any specific flags I can set to > the stmt to make 'em hang around, and I do not know what to do to make LHS > appear not "dead." Even if I set 'ssa' TREE_USED and 'decl' as > DECL_PRESERVE_P. Well, that's not how SSA uses/defs work. > Thanks for any information! > > -Matt
Re: Missing compile-time warning for orphaned memory
Hi gcc team, The following code creates orphaned memory by ignoring the return value of new: #include using namespace std; int main ( void ) { for (int i=0; i< 1000; i++) new int[1000]; int a; cin >> a; return 0; } Should g++ report a compile-time warning for this case? root@quant:/tmp# g++ -Wall -Wextra MemoryLeakCheckCompilerWarning.cpp -o MemoryLeakCheckCompilerWarning.exe root@quant:/tmp# Warm regards, -S
clear_cache on Alpha architecture not implemented?
On 05-03 09:19, Camm Maguire wrote: > Greetings! The build succeeded, which, alas, means the failure at > > http://buildd.debian-ports.org/status/fetch.php?pkg=axiom&arch=alpha&ver=20120301-3&stamp=1335722507 > > is again unreproducible, like the kfree_amd64 and ppc examples. Sigh. > Are there known differences in the cpu cache clearing instructions > between imago and your machine? > > In any case, can one upload by hand builds to debian-ports like one can > into the official repository? > > Take care, > -- > Camm Maguire c...@maguirefamily.org > == > "The earth is but one country, and mankind its citizens." -- Baha'u'llah After lots of investigation, it looks that imb (I-cache memory barrier, used for flushing instruction cache on single cpu), is not emited or called when doing __builtin___clear_cache! It shouldn't be hard to add, for example using (define_expand "clear_cache" ... in gcc/config/alpha/alpha.md in gcc, or in libgcc/config/alpha/linux.h as #define CLEAR_INSN_CACHE(beg, end) ... Either of them should just emit, "call_pal 0x86". It is better than "imb" because imb is not implemented in Tru64 assembler, and better than "call_pal pal_imb", because pal_imb (which is equal 134), requires include . Beyond that it would be naccassary to update kernel, to detect that user code called imb, by checking if proper flag in HWPCB structure is set, and call smp_imb() if naccassary on multi-processor system (probably when rescheduling this process on different cpu) and clear hwpcb flag, becuause imb invalidated Icache only on running cpu. Build of axiom package successed on my machine probably by luck, and by the fact it is single CPU machine with smaller cache than imago (and older subarchitecture, thus probably less agressive caching). What alpha maintainers thinks? Regards, Witek -- Witold Baryluk
Re: Missing compile-time warning for orphaned memory
On 3 May 2012 17:16, Samuel David wrote: > Hi gcc team, > > The following code creates orphaned memory by ignoring the return value of > new: [...] > Should g++ report a compile-time warning for this case? Maybe. If you think so then please open a bugzilla report asking for a new warning, thanks.
Re: Missing compile-time warning for orphaned memory
On Thu, May 3, 2012 at 1:29 PM, Jonathan Wakely wrote: > > Maybe. If you think so then please open a bugzilla report asking for > a new warning, thanks. Done! http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53215 With sincere regards, -S
Re: clear_cache on Alpha architecture not implemented?
Greetings, and thanks foro looking into this! Witold Baryluk writes: > On 05-03 09:19, Camm Maguire wrote: >> Greetings! The build succeeded, which, alas, means the failure at >> >> http://buildd.debian-ports.org/status/fetch.php?pkg=axiom&arch=alpha&ver=20120301-3&stamp=1335722507 >> >> is again unreproducible, like the kfree_amd64 and ppc examples. Sigh. >> Are there known differences in the cpu cache clearing instructions >> between imago and your machine? >> >> In any case, can one upload by hand builds to debian-ports like one can >> into the official repository? >> >> Take care, >> -- >> Camm Maguire c...@maguirefamily.org >> == >> "The earth is but one country, and mankind its citizens." -- Baha'u'llah > > After lots of investigation, it looks that imb (I-cache memory barrier, > used for flushing instruction cache on single cpu), is not emited or > called when doing __builtin___clear_cache! > > It shouldn't be hard to add, for example using (define_expand > "clear_cache" ... in gcc/config/alpha/alpha.md in gcc, or in > libgcc/config/alpha/linux.h > as #define CLEAR_INSN_CACHE(beg, end) ... > > Either of them should just emit, "call_pal 0x86". It is better than > "imb" because imb is not implemented in Tru64 assembler, and better than > "call_pal pal_imb", because pal_imb (which is equal 134), requires > include . > > Beyond that it would be naccassary to update kernel, to detect that user > code called imb, by checking if proper flag in HWPCB structure is set, > and call smp_imb() if naccassary on multi-processor system (probably > when rescheduling this process on different cpu) and clear hwpcb flag, > becuause imb invalidated Icache only on running cpu. > > > Build of axiom package successed on my machine probably by luck, and by > the fact it is single CPU machine with smaller cache than imago (and > older subarchitecture, thus probably less agressive caching). > > What alpha maintainers thinks? Pending a true fix, here is the gcl situation when built on alpha: checking for main in -lX11... yes checking for xdr_double... yes checking __builtin___clear_cache... yes and the relvant gcl code: #define PAL_imb 134 #define imb() \ __asm__ __volatile__ ("call_pal %0 #imb" : : "i" (PAL_imb) : "memory") #define CLEAR_CACHE imb() #ifdef HAVE_BUILTIN_CLEAR_CACHE static int clear_protect_memory(object memory) { void *p,*pe; int i; p=(void *)((unsigned long)memory->cfd.cfd_start & ~(PAGESIZE-1)); pe=(void *)((unsigned long)(memory->cfd.cfd_start+memory->cfd.cfd_size) & ~(PAGESIZE-1)) + PAGESIZE-1; i=mprotect(p,pe-p,PROT_READ|PROT_WRITE|PROT_EXEC); __builtin___clear_cache((void *)memory->cfd.cfd_start,(void *)memory->cfd.cfd_start+memory->cfd.cfd_size); return i; } #endif #ifdef HAVE_BUILTIN_CLEAR_CACHE massert(!clear_protect_memory(memory)); #else #ifdef CLEAR_CACHE CLEAR_CACHE; #endif #endif The goal was to exercise the very helpful gcc __builtin___clear_cache support, and to avoid having to maintain our own assembler for all the different cpus in this regard. Clearly, it is easy to revert this on a per architecture basis if absolutely necessary. If gcc does or does not plan on fixing this, please let me know so gcl can adjust as needed. Take care, -- Camm Maguirec...@maguirefamily.org == "The earth is but one country, and mankind its citizens." -- Baha'u'llah
Re: clear_cache on Alpha architecture not implemented?
On 05/03/2012 10:51 AM, Camm Maguire wrote: The goal was to exercise the very helpful gcc __builtin___clear_cache support, and to avoid having to maintain our own assembler for all the different cpus in this regard. Clearly, it is easy to revert this on a per architecture basis if absolutely necessary. If gcc does or does not plan on fixing this, please let me know so gcl can adjust as needed. While we can probably fix this, you should know that __builtin_clear_cache is highly tied to the implementation of trampolines for the target. Thus there are at least 3 targets that do not handle this "properly": For alpha, we emit imb directly during the trampoline_init target hook. For powerpc32, the libgcc routine __clear_cache is unimplemented, but the cache flushing for trampolines is inside the __trampoline_setup routine. For powerpc64 and ia64, the ABI for function calls allows trampolines to be implemented without emitting any insns, and thus the icache need not be flushed at all. And thus we never bothered implementing __builtin_clear_cache. So, the fact of the matter is that you can't reliably use this builtin for arbitrary targets for any gcc version up to 4.7. Feel free to submit an enhancement request via bugzilla so that we can remember to address this for gcc 4.8. r~
GNU Tools Cauldron 2012 - Hotels and registered presentations
An update on the GNU Tools Cauldron (http://gcc.gnu.org/wiki/cauldron2012) We have published an initial schedule for the workshop. It is available at http://gcc.gnu.org/wiki/cauldron2012. Presenters, please double-check your entries. If you find anything missing or wrong, please contact me and I will correct it. Thank you.
Re: clear_cache on Alpha architecture not implemented?
On 05-03 11:34, Richard Henderson wrote: > On 05/03/2012 10:51 AM, Camm Maguire wrote: > >The goal was to exercise the very helpful gcc __builtin___clear_cache > >support, and to avoid having to maintain our own assembler for all the > >different cpus in this regard. Clearly, it is easy to revert this on a > >per architecture basis if absolutely necessary. If gcc does or does not > >plan on fixing this, please let me know so gcl can adjust as needed. > > While we can probably fix this, you should know that __builtin_clear_cache > is highly tied to the implementation of trampolines for the target. Thus > there are at least 3 targets that do not handle this "properly": > > For alpha, we emit imb directly during the trampoline_init target hook. > > For powerpc32, the libgcc routine __clear_cache is unimplemented, but the > cache flushing for trampolines is inside the __trampoline_setup routine. > > For powerpc64 and ia64, the ABI for function calls allows trampolines to > be implemented without emitting any insns, and thus the icache need not be > flushed at all. And thus we never bothered implementing > __builtin_clear_cache. > > So, the fact of the matter is that you can't reliably use this builtin for > arbitrary targets for any gcc version up to 4.7. Feel free to submit an > enhancement request via bugzilla so that we can remember to address this > for gcc 4.8. > > > > r~ __builtin__clear_cache was introduced in gcc 4.3.0 (November, 2008), so I understand alpha could be omited. I belive on alpha trampoline init just emits imb directly using inline asm. Implementing nacassary part into clear_cache should not break it, and actual will make it possible to simplify it in the future. Also kernel side improvment, should not break trampoline init. It is just now more a matter of luck that it works currently on multi cpu systems. Alpha ARM does say that Alpha implementation do not need to guarantee that imb will invalidate Icache on other CPUs. It currently works maybe because actuall implementation actually invalidate Icache even on multi CPU system, or because trampoline init happens very early, definietly before any threads other than main is started, and with high probability it will not be migrated to other physical CPU. I cannot find in kernel actuall code for handling invalidation of Icache in userspace, so I currently assume it is not implemented. But should be. Generally Icache invalidation on alpha is IMHO designed badly, because any user can invalidate all Icache in whole system (including other users/processes code and kernel code), thus decressing performance considerably. Many other architectures, have explicit memory range given for such operation, and ownership of this memory region is checked (by hardware or kernel). I understand it is artificial problem now probably (due small importantce of alpha arch nowdays), but there are possible workarounds for handling such wrong-doing processes. Anyway process can still invalidate Icache without kernel help, by just starting multiple threads, pining them to all procesors and doing imb in userspace anyway. Despite being unpriviliged userspace PAL_CALL, there is probably some way to trap imb call (maybe even without patching actuall PALcode), and handle it in kernel space? The problem with __builtin___clear_cache is that it defaults to noop, and in code like axiom, which checks if __builtin___clear_cache is present, it is assumes that presence is equal support. I think __builtin___clear_cache should at least default to compile-time warning about its unimplemented status. (for architectures which doesn't need cache invalidation like x86 or amd64, do not emit such warning, if any other architecture need it too, it can always just make sure CACHE_INSN_CLEAR is defined as constant macro). If you do not want to do this (change defaults), then probably better make sure that usage of __builtin___clear_cache on such architectures like Itanium, PowerPC or Alpha, actually makes error at compile-time. As of brokness again, it is of course better to add support (especially that it is not hard to) for __builting_clear_cache, but there always will be new archs, and changing defaults will be better for future archs, and this which are less maintained. It took me few hours to find problem in axiom add gcc. Stoping compilation with error on such architectures will be the best thing. This will make to use __builtin_clear_cache reliabile for clearing cache, and will make possible to clear code of similar programs to gcl (mainly various compilers, JITs and interpreters). Isn't this what compiler is for? To abstract machine releated differences, like cache handling or vector manipulations? Similar atomic instructions should be available in gcc in abstract way. Currently, software which for example adds 2 64-bit numbers in atomic way in memory, needs to steal code from kernel or other libraries. This is compiler job, and should sit in compiler as buil
gcc-4.5-20120503 is now available
Snapshot gcc-4.5-20120503 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20120503/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch revision 187123 You'll find: gcc-4.5-20120503.tar.bz2 Complete GCC MD5=5e03e53436893d24f7bd29ce3dd3f7c4 SHA1=e60271c72fa3d4684078f92b388ee73d0f67240a Diffs from 4.5-20120426 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.