Re: Might a -native-semantics switch, forcing native target optimization semantics, be reasonable?
Robert Dewar <[EMAIL PROTECTED]> writes: | Gabriel Dos Reis wrote: | | >Maybe that is the case for Ada; for the C or C++ standards, you'll | > have to define "good reason". -- Gaby | > | Again, I suggest that vague high level discussion is a waste of time | here, I wholeheartly agree, that is why I find you need to back up your universal claim. -- Gaby
Re: Might a -native-semantics switch, forcing native target optimization semantics, be reasonable?
Gabriel Dos Reis wrote: Robert Dewar <[EMAIL PROTECTED]> writes: | Gabriel Dos Reis wrote: | | >Maybe that is the case for Ada; for the C or C++ standards, you'll | > have to define "good reason". -- Gaby | > | Again, I suggest that vague high level discussion is a waste of time | here, I wholeheartly agree, that is why I find you need to back up your universal claim. You mean my claim that the standards group knows what it is doing and is not stupid? :-) Well I could give many examples, but I still think it is more useful to get back to the original intent of Paul's thread, which is to describe specific semantics for some of these cases. Otherwise we are really discussing entirely irrelevant matters. -- Gaby
Re: Might a -native-semantics switch, forcing native target optimization semantics, be reasonable?
Robert Dewar <[EMAIL PROTECTED]> writes: | Gabriel Dos Reis wrote: | | >Robert Dewar <[EMAIL PROTECTED]> writes: | > | >| Gabriel Dos Reis wrote: | > | | >Maybe that is the case for Ada; for the C or C++ standards, | > you'll | >| > have to define "good reason". -- Gaby | >| > | >| Again, I suggest that vague high level discussion is a waste of time | > | here, I wholeheartly agree, that is why I find you need to back up | > your | >universal claim. | > | You mean my claim that the standards group knows what it is doing and | is not stupid? :-) Your claim was this: # IN every case where the standard specifies undefined behavior, it # has a very good reason for doing so. | Well I could give many examples, unless "many" == "exhaustive", you'll just have wasted time and bandwidth :-) | but I still think it is more useful to get back to the original intent | of Paul's thread, but that is exactly the point! We can't usefully discuss his intent with unsupported claims. I predict we will end up changing nothing for GCC; I also predict that he'll be unsatisfied with unsupported claims; then, there is a high chance the discussion will be restarted again in a different form (hey, this is not the first time you, Paul and me are involved in this kind of undefined discussion). If we have good answers with backed up claims, we could just reference it later. | which is to describe specific semantics for some of | these cases. Otherwise we | are really discussing entirely irrelevant matters. I think supported answers will help diminish undefined disucssion frequency. If you don't have a definition for "good reason", that is fine. Let's move on something else; this is a new year. -- Gaby
Re: Might a -native-semantics switch, forcing native target optimization semantics, be reasonable?
Gabriel Dos Reis wrote: Robert Dewar <[EMAIL PROTECTED]> writes: | Gabriel Dos Reis wrote: | | >Robert Dewar <[EMAIL PROTECTED]> writes: | > | >| Gabriel Dos Reis wrote: | > | | >Maybe that is the case for Ada; for the C or C++ standards, | > you'll | >| > have to define "good reason". -- Gaby | >| > | >| Again, I suggest that vague high level discussion is a waste of time | > | here, I wholeheartly agree, that is why I find you need to back up | > your | >universal claim. | > | You mean my claim that the standards group knows what it is doing and | is not stupid? :-) Your claim was this: # IN every case where the standard specifies undefined behavior, it # has a very good reason for doing so. | Well I could give many examples, unless "many" == "exhaustive", you'll just have wasted time and bandwidth :-) Right, which is why I decline :-) | but I still think it is more useful to get back to the original intent | of Paul's thread, but that is exactly the point! We can't usefully discuss his intent with unsupported claims. I agree, we can only discuss his intent if he gives specific examples I predict we will end up changing nothing for GCC; I also predict that he'll be unsatisfied with unsupported claims; then, there is a high chance the discussion will be restarted again in a different form (hey, this is not the first time you, Paul and me are involved in this kind of undefined discussion). If we have good answers with backed up claims, we could just reference it later. I agree with your prediction | which is to describe specific semantics for some of | these cases. Otherwise we | are really discussing entirely irrelevant matters. I think supported answers will help diminish undefined disucssion frequency. If you don't have a definition for "good reason", that is fine. Let's move on something else; this is a new year. My definition of good reason does not differ from the dictionary definition, nothing special. My point is that at least for the cases I am aware of, the decision to make something undefined in the standard makes sense, and it is not "ab initio" obvious what it would mean to say that the semantics should be those of the native target. That does not mean it is inappropriate to define them further in specific cases. As you know, my general view is that GCC is a little too ready to take advantage of undefined in its optimization approach, so I am not unsympathetic to discussing specific cases in which this should at least optionally be constrained. We have already discussed the wrap situation in detail, though Paul's mention of saturating arithmetic seems way out of left field to me, since very few common architectures implement saturating arithmetic at the hardward level. -- Gaby
Re: Might a -native-semantics switch, forcing native target optimization semantics, be reasonable?
I am not sure if the original poster meant the same as I do. What I have in mind is optimizations opportunities like the one of the following linked list: struct list_element { struct list_element *p_next; int *p_value; }; int sum_list_values(const struct list_element *p_list) { int sum=0; for ( ; p_list ; p_list= p_list->p_next) if (p_list->p_value) sum += *p_list->p_value; return sum; } Now, if the programmer wants to gain 5% performance (or more), he might mmap /dev/zero to address 0x00 such that *(*int)0 == 0 Then, the compiler or the coder could unroll the loop to minimizes conditional branches: int sum_list_values(const struct list_element *p_list) { int sum=0; while (p_list) { sum += *p_list->p_value; // undefined if p_value == NULL p_list= p_list->p_next; sum += *p_list->p_value; // and undefined if p_list == NULL p_list= p_list->p_next; } return sum; } It could be a big gain on architectures that can't do effective predication of the load from p_list->p_value. I once heard that xlC did something like that on AIX, automatically. Does GCC take advantage of,e.g., AIX's mapping of address 0x to zeros? Can GCC be easily taught to do so? -- Michael Quoting Mike Stump <[EMAIL PROTECTED]>: > On Dec 31, 2005, at 10:51 AM, Paul Schlie wrote: > > As although C/C++ define some expressions as having undefined > > semantics; > > I'd rather it be called --do-what-i-mean. :-) > > Could you give us a hint at what all the semantics you would want to > change with this option? Are their any code bases that you're trying > to compile? Compilers that you're trying to be compatible with? >
RTL alias analysis
Hi rth, The stack space sharing you added to cfgexpand.c breaks RTL alias analysis. For example, the attached test case breaks for pentiumpro at -O2. The problem apparently is that the second store to c is moved up before before the load. This looks like a serious problem to me... Many thanks to Honza for crafting this test case. Gr. Steven extern void abort (void) __attribute__((noreturn)); union setconflict { short a[20]; int b[10]; }; int main () { int sum = 0; { union setconflict a; short *c; c = a.a; asm ("": "=r" (c):"0" (c)); *c = 0; asm ("": "=r" (c):"0" (c)); sum += *c; } { union setconflict a; int *c; c = a.b; asm ("": "=r" (c):"0" (c)); *c = 1; asm ("": "=r" (c):"0" (c)); sum += *c; } printf ("%d\n",sum); if (sum != 1) abort(); return 0; } .file "t.c" # GNU C version 4.2.0 20060101 (experimental) (x86_64-unknown-linux-gnu) # compiled by GNU C version 4.0.2 20050901 (prerelease) (SUSE Linux). # GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 # options passed: -iprefix -isystem -m32 -march=pentiumpro -auxbase -O2 # -fdump-tree-vars -fomit-frame-pointer -fverbose-asm # options enabled: -falign-loops -fargument-alias -fbranch-count-reg # -fcaller-saves -fcommon -fcprop-registers -fcrossjumping # -fcse-follow-jumps -fcse-skip-blocks -fdefer-pop # -fdelete-null-pointer-checks -fearly-inlining # -feliminate-unused-debug-types -fexpensive-optimizations -ffunction-cse # -fgcse -fgcse-lm -fguess-branch-probability -fident -fif-conversion # -fif-conversion2 -finline-functions-called-once -fipa-pure-const # -fipa-reference -fipa-type-escape -fivopts -fkeep-static-consts # -fleading-underscore -floop-optimize -floop-optimize2 -fmath-errno # -fmerge-constants -fomit-frame-pointer -foptimize-register-move # -foptimize-sibling-calls -fpcc-struct-return -fpeephole -fpeephole2 # -fregmove -freorder-blocks -freorder-functions -frerun-cse-after-loop # -frerun-loop-opt -fsched-interblock -fsched-spec # -fsched-stalled-insns-dep -fschedule-insns2 -fshow-column # -fsplit-ivs-in-unroller -fstrength-reduce -fstrict-aliasing # -fthread-jumps -ftrapping-math -ftree-ccp -ftree-ch -ftree-copy-prop # -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-fre # -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize -ftree-lrs # -ftree-pre -ftree-salias -ftree-sink -ftree-sra -ftree-store-ccp # -ftree-store-copy-prop -ftree-ter -ftree-vect-loop-version -ftree-vrp # -funit-at-a-time -fverbose-asm -fzero-initialized-in-bss -m32 -m80387 # -m96bit-long-double -maccumulate-outgoing-args -malign-stringops # -mfancy-math-387 -mfp-ret-in-387 -mieee-fp -mno-red-zone -mpush-args # -mtls-direct-seg-refs # Compiler executable checksum: 6d90f1c30ff8027bc6976ab2dbfe2320 .section.rodata.str1.1,"aMS",@progbits,1 .LC0: .string "%d\n" .text .p2align 4,,15 .globl main .type main, @function main: leal4(%esp), %ecx #, andl$-16, %esp #, pushl -4(%ecx)# subl$76, %esp #, leal28(%esp), %eax #, tmp64 movl%ecx, 68(%esp) #, movl%eax, %edx # tmp64, c movl%ebx, 72(%esp) #, movw$0, (%edx) #,* c movl$1, (%eax) #,* c movswl (%edx),%ebx #* c, sum movl(%eax), %edx#* c, movl$.LC0, (%esp) #, addl%edx, %ebx #, sum movl%ebx, 4(%esp) # sum, callprintf # decl%ebx# sum jne .L6 #, movl68(%esp), %ecx #, xorl%eax, %eax # movl72(%esp), %ebx #, addl$76, %esp #, leal-4(%ecx), %esp #, ret .L6: callabort # .size main, .-main .ident "GCC: (GNU) 4.2.0 20060101 (experimental)" .section.note.GNU-stack,"",@progbits
-fpic no optimization...
Happy 2006! I was compiling LZMA SDK (http://www.7-zip.org/, LzmaDecode.c) and just for curiosity I looked at output assembler. I noted that when PIC is enabled (-fpic, Linux Intel) ebx is reserved to global pointer. However LzmaDecode do not access any global data and do not call other functions (no relocations at all) so why not use ebx register? -fpic make compiler just not use ebx. I tried using different versions (gcc 3.4.4 from Fedora Core 3 and 4.0.2 from Fedora Core 4) with same result. Frediano Ziglio (aka freddy77)
Re: Might a -native-semantics switch, forcing native target optimization semantics, be reasonable?
> Gabriel Dos Reis wrote: > I predict we will end up changing nothing for GCC; I also predict that > he'll be unsatisfied with unsupported claims; then, there is a high > chance the discussion will be restarted again in a different form > (hey, this is not the first time you, Paul and me are involved in this > kind of undefined discussion). If we have good answers with backed up > claims, we could just reference it later. > ... > I think supported answers will help diminish undefined disucssion frequency. > If you don't have a definition for "good reason", that is fine. Let's move > on something else; this is a new year. Gentlemen, I honestly don't want to promote a frivolous debate either; as such, if it is generally agreed there are no valid conceivable reasons to enable GCC to allow a target to specify the semantics of its particular implementation to enable optimizers to utilize that information such that its otherwise non-optimized behavior may be preserved during optimization, then I simply accept that I'm alone in the believe of its significance and utility; if not, then it seems that the only questions remaining are related to how may this be done relatively easily within the current optimization and target definition framework? If specific examples are required, although likely considered obvious, here's a few: - x[y] = 0; // may also be undefined if y is one past the if (x[y]) y = y+1;// extent of x[], and/or result in an overflow // of x[y] one past the it's upper bound if // allocated at the upper range of the address // space, and/or if allocated at the base // address of 0 and y is 0 or negative; but // non-the less the machine evaluating the code // will do something which is likely well // defined, and must be defined if that logical // behavior is to be predictable and thereby // preferable during optimization. - x = 5 << z; // where although implementation specific, must if (x > 0) z = 2; // to be definable if to be utilized as the // basis of a target specific behavior // preserving optimization.
Re: Might a -native-semantics switch, forcing native target optimization semantics, be reasonable?
> From: Robert Dewar <[EMAIL PROTECTED]>> > though Paul's mention of saturating arithmetic seems way out of left field to > me, since very few common architectures implement saturating arithmetic at the > hardware level. - Only every single dsp, and an increasing number of otherwise conventional and/or vector extended architectures targeted to improve the support for the same? (not to mention corresponding modular and/or specific algorithm addressing mode support, which may ideally be the target of optimized code mapping to improve performance, which correspondingly would ideally require their generic definition) (But acknowledge these machines tend not to be targeted as the CPU of a PC, if that is the intended focus of GCC)
Re: RTL alias analysis
Steven Bosscher wrote: > Hi rth, > > The stack space sharing you added to cfgexpand.c breaks RTL alias > analysis. > > For example, the attached test case breaks for pentiumpro at -O2. > The problem apparently is that the second store to c is moved up > before before the load. My guess at a solution is that when A (with alias set S_a) and B (with alias set S_b) are given the same stack slot, we should create a new alias set S_c which is a subset of both S_a and S_b, and give the combined stack slot that aliase set. -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (650) 331-3385 x713
Re: RTL alias analysis
> Steven Bosscher wrote: > > Hi rth, > > > > The stack space sharing you added to cfgexpand.c breaks RTL alias > > analysis. > > > > For example, the attached test case breaks for pentiumpro at -O2. > > The problem apparently is that the second store to c is moved up > > before before the load. > > My guess at a solution is that when A (with alias set S_a) and B (with > alias set S_b) are given the same stack slot, we should create a new > alias set S_c which is a subset of both S_a and S_b, and give the > combined stack slot that aliase set. This won't work for the indirect accesses via pointers, like in the testcase. Honza > > -- > Mark Mitchell > CodeSourcery, LLC > [EMAIL PROTECTED] > (650) 331-3385 x713
Re: Might a -native-semantics switch, forcing native target optimization semantics, be reasonable?
> From: Michael Veksler wrote: > I am not sure if the original poster meant the same as I do. What I have > in mind is optimizations opportunities like the one of the following > linked list: (for reference: http://gcc.gnu.org/ml/gcc/2006-01/msg7.html ) - although not specifically what I was concerned about, it certainly seems reasonable as a non-portable target specific definition that may correspondingly be utilized as a basis of a target specific behavior preserving optimization.
Successful Build: gcc-4.1-20051230 i686-pc-mingw32
E:\msys\1.0\home\gcc>gcc -v Using built-in specs. Target: mingw32 Configured with: ../gcc-4.1-20051223/configure --host=mingw32 --build=mingw32 -- target=mingw32 --enable-threads --disable-nls --enable-optimize --enable-languag es=c,c++ --prefix=e:/mingw4 Thread model: win32 gcc version 4.1.0 20051230 (prerelease) w32api 3.5 mingw-runtime 3.9 binutils 2.16.91-20050827-1 Build required manual correction of gcc-obj\gcc\Makefile - ORIGINAL_LD_FOR_TARGET gets set to "./E:/mingw/bin/ld.exe" different from the other stuff such as ORIGINAL_AS_FOR_TARGET=/mingw/bin/as. make doesn't like the "E:/" and build fails with an error Makefile:1277 : target pattern contains no '%'. Stop. Setting ORIGINAL_LD_FOR_TARGET=/mingw/bin/ld.exe makes it build successfully. The built GCC and G++ succesfully compile Trolltech Qt-4.1 Windows. Every compiled thing works fine and fast at -O2 -mtune=athlon64 -mmmx -msse2 ! Thanks Parag
Re: Might a -native-semantics switch, forcing native target optimization semantics, be reasonable?
Paul Schlie wrote: Gabriel Dos Reis wrote: I predict we will end up changing nothing for GCC; I also predict that he'll be unsatisfied with unsupported claims; then, there is a high chance the discussion will be restarted again in a different form (hey, this is not the first time you, Paul and me are involved in this kind of undefined discussion). If we have good answers with backed up claims, we could just reference it later. ... I think supported answers will help diminish undefined disucssion frequency. If you don't have a definition for "good reason", that is fine. Let's move on something else; this is a new year. Gentlemen, I honestly don't want to promote a frivolous debate either; as such, if it is generally agreed there are no valid conceivable reasons to enable GCC to allow a target to specify the semantics of its particular implementation to enable optimizers to utilize that information such that its otherwise non-optimized behavior may be preserved during optimization, then I simply accept that I'm alone in the believe of its significance and utility; if not, then it seems that the only questions remaining are related to how may this be done relatively easily within the current optimization and target definition framework? If specific examples are required, although likely considered obvious, here's a few: - x[y] = 0; // may also be undefined if y is one past the if (x[y]) y = y+1;// extent of x[], and/or result in an overflow // of x[y] one past the it's upper bound if // allocated at the upper range of the address // space, and/or if allocated at the base // address of 0 and y is 0 or negative; but // non-the less the machine evaluating the code // will do something which is likely well // defined, and must be defined if that logical // behavior is to be predictable and thereby // preferable during optimization. We have been through this before, there is no requirement that optimization preserve the behavior of undefined programs (indeed that is often one of the primary motivations in making things undefined). It is fine to argue that defining the semantics is useful in a particular case, but arguing solely from the point of view of trying to preserve observed behaviort is a poor argument. Indeed the point is that optimization is not changing the behavior, the behavior is non-deterministic, and can change from one compilation to the next even if optimization does not change. - x = 5 << z; // where although implementation specific, must if (x > 0) z = 2; // to be definable if to be utilized as the // basis of a target specific behavior // preserving optimization. ditto
Re: RTL alias analysis
On Sun, 2006-01-01 at 10:22 -0800, Mark Mitchell wrote: > Steven Bosscher wrote: > > Hi rth, > > > > The stack space sharing you added to cfgexpand.c breaks RTL alias > > analysis. > > > > For example, the attached test case breaks for pentiumpro at -O2. > > The problem apparently is that the second store to c is moved up > > before before the load. > > My guess at a solution is that when A (with alias set S_a) and B (with > alias set S_b) are given the same stack slot, we should create a new > alias set S_c which is a subset of both S_a and S_b, and give the > combined stack slot that aliase set. Won't work here, sadly, AFAIK. This is because it's not TBAA that gets you here, and in fact, it won't help (In fact, they already should be in the same alias set because they are union'd together). Take a look at true_dependence, or canon_true_dependence, in alias.c, and you'll see that there are a bunch of times that even if the alias sets say they conflict, we will return that they don't conflict. This is one of those times. If this is the same testcase steven was discussing on IRC, the real solution is to transfer the information that the stack space sharing knows into some simple set form, and use *that directly* in alias.c, and check it *first*, so that if they have the same stack slot, we say there is a dependence, even if the memory expressions/types/etc look different. This also lets you say for sure whether things have a different stack slot or not, which we seem to try to fathom using the reg_*_value stuff (why guess when we could just ask where it put them?) >
The Extension IDE - EXTEIDE
EXTEIDE is freeware. Anyone can download now. Please visit http://www.exteide.com Thanks. -- Sent from the gcc - Dev forum at Nabble.com: http://www.nabble.com/The-Extension-IDE---EXTEIDE-t835749.html#a2166859