[RFC] Flow: Handle CLOBBERs like SETs if the reg stays live
I had posted this at first under a somewhat misleading subject (someone could say completely wrong) - hope that has improved yet ;-) Hello, > Just as an FYI, Kenny and I have replaced the global liveness analyzer > with one from df.c, and removed the need for make_accurate_live_analysis > (ie df.c now does the partial avail liveness stuff). > > We are currently in the process of changing all the other users of life > analysis to use the df.c liveness, which should solve this problem > (since they will all then use the accurate life analysis). Good to hear! Unfortunately this will come too late to fix the bug in gcc 4.1. What about the following workaround? This one could be fine for the 4.1.0 release: Remember the situation after reload. r1 is used for an uninitialized variable hence it is live from the beginning to the read of the variable. The special liveness analyzer in global alloc not considering uninitialized quantities to conflict with others assigns r1 also to a local pseudo. bb 1 start: r1 dead insn 1: r1 = 0; insn 2: use r1; clobber r1; REG_DEAD (r1) !!! bb 1 end: r1 live Due to the REG_DEAD note regrename assumes it is safe to rename r1 locally to r5 resulting in: bb 1 start: r1 live !!! insn 1: r5 = 0; insn 2: use r5; clobber r5; REG_DEAD (r5) bb 1 end: r1 live Because live at start has changed due to a local change an assertion in flow is triggered. The cause of the problem is that we already enter regrename with wrong liveness info which in turn is caused by using a different liveness analyzer in global alloc than elsewhere. We have the register r1 live at the end of the basic block but the last rtx writing this value is a clobber. This situation should only occur if we had overlapping live ranges which were assigned to the same hard reg. Under normal circumstances this should be considered a bug and an assertion would be the right choice. But knowing that this could only be caused by the global alloc liveness analyzer the flow analyzer should avoid emitting a REG_DEAD note. That way the value of the uninitialized variable (which is undefined anyway) would be clobbered what should not cause any trouble. The attached patch modifies mark_set_1 to consider a CLOBBER a normal SET if the clobbered register is live afterwards. So as a rfc what do you (or anybody else) think about the attached patch for mainline gcc. Bootstrapped on i686, s390 and s390x without testsuite regressions. This patch fixes the 920501-4.c regression on s390x. Bye, -Andreas- 2005-10-13 Andreas Krebbel <[EMAIL PROTECTED]> * flow.c (mark_set_1): Handle CLOBBERs like SETs if the register is live afterwards. Index: gcc/flow.c === RCS file: /cvs/gcc/gcc/gcc/flow.c,v retrieving revision 1.635 diff -p -c -r1.635 flow.c *** gcc/flow.c 22 Aug 2005 16:58:46 - 1.635 --- gcc/flow.c 13 Oct 2005 09:25:51 - *** mark_set_1 (struct propagate_block_info *** 2819,2825 else SET_REGNO_REG_SET (pbi->local_set, i); } ! if (code != CLOBBER) SET_REGNO_REG_SET (pbi->new_set, i); some_was_live |= needed_regno; --- 2819,2825 else SET_REGNO_REG_SET (pbi->local_set, i); } ! if (code != CLOBBER || needed_regno) SET_REGNO_REG_SET (pbi->new_set, i); some_was_live |= needed_regno;
Re: Update on GCC moving to svn
On Fri, 14 Oct 2005, Kaveh R. Ghazi wrote: > their corresponding svn commands posted on the website? Hmm, I guess > we would need to update these pages to the svn equivalents which would > pretty much cover the basics of a how-to guide: > > http://gcc.gnu.org/cvs.html > http://gcc.gnu.org/cvswrite.html I understand patches for at least some of (contrib, maintainer-scripts, wwwdocs) to convert to svn are ready. Could whichever patches are ready be posted to gcc-patches? That way people can test them, and see if anything still needs converting. (In the case of maintainer-scripts/gcc_release, all of the mainline, 4.0 and 3.4 versions are relevant as each branch's version is used for releases from that branch. In the case of contrib/gcc_update each active branch will need its version patched but the same patch should work in each case. Otherwise I think only mainline versions are relevant.) -- Joseph S. Myers http://www.srcf.ucam.org/~jsm28/gcc/ [EMAIL PROTECTED] (personal mail) [EMAIL PROTECTED] (CodeSourcery mail) [EMAIL PROTECTED] (Bugzilla assignments and CCs)
Re: Update on GCC moving to svn
(Sorry for the C-n-P, my regular email is down till sunday afternoon. Hopefully notes won't screw this email up too badly) >Has any thought been put into helping the 200+ people with write >access migrate? Of course There is a basic how to guide for CVS users on the wiki, at http://gcc.gnu.org/wiki/SvnHelp It has been there since February, and the repository it references for the same amount of time. Once the new test repo is ready (As i've mentioned before, the apple branches take a while to convert, it looks like another 5 hours is still left), i will update it with the gcc.gnu.org location. You'll also note that for those wanting to test anonymous rsync, rsync gcc.gnu.org:: now shows a gcc-svn to rsync from. (You can start now if you want, i've been rsync'ing it from my machine as it converts. Obviously, until it's completely finished, the repo you get won't be complete, but you won't need to retransfer anything that you rsync now, since all the existing revisions are immutable) > I.e. a quick how-to guide for simple cvs actions and >their corresponding svn commands posted on the website? Hmm, I guess >we would need to update these pages to the svn equivalents which would >pretty much cover the basics of a how-to guide: >http://gcc.gnu.org/cvs.html >http://gcc.gnu.org/cvswrite.html I plan on updating these pages, using the current text on the wiki (It was written by a combination of Giovanni and I), as far as the edit history shows. >Also, we have a bunch of mirrors sites. Are these updated through >cvs? If so, what are we doing about that? We have one mirror site who mirrors our CVS. That was one of the people i copied on the last status email. >Thanks, >--Kaveh
Re: Successfull build of gcc-4.0.2 on mips-sgi-irix6.5
On Wed, Oct 12, 2005 at 02:29:56PM +0200, Rainer Emrich wrote: > Compiler version: 4.0.2 > Platform: mips-sgi-irix6.5 > configure flags: > - - --prefix=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install > - - --with-gnu-as > - - --with-as=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install/bin/as > - - --with-gnu-ld > - - --with-ld=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install/bin/ld > - - --disable-shared --enable-threads=posix --enable-haifa --disable-nls > - - --disable-libmudflap --with-gmp=/appl/shared/gnu/IRIX64/mips-sgi-irix6.5 > - - --with-mpfr=/appl/shared/gnu/IRIX64/mips-sgi-irix6.5 > - - --enable-languages=c,ada,c++,f95,objc Can the resulting GCC build Emacs _and_ XEmacs? -- albert chin ([EMAIL PROTECTED])
Passing va_args...
Hello, Once I saw a gcc macro that passes variables arguments to another variable argument function..example: function_1 (int z, ...); function_2 (int z, ...); { return function_1 (z, MACRO); } Does anyone remember the macro name ? TIA
Re: Linking of object files from different compilers for ARM
On Fri, Oct 14, 2005 at 09:49:42AM +0300, Yaroslav Karulin wrote: > Hello! > > I have two files: foo.c and main.c. foo.c is compiled with RVTC 2.2 > compiler. main.c is compiled with gcc compiler (configured with > --target=arm-elf). I cannot link them together using gcc linker. > But it's possible to link files if I use CodeSourcery version of gcc. > CodeSourcery guys writes that they have added full EABI support and hope > to submit it to the gcc 4.1. > So, the question is what's the difference between CodeSourcery's > version of gcc and FSF version? And is EABI support really submitted to > the gcc 4.1? The difference is that it's configured for an EABI target, not an ELF (legacy) target. Build an arm-none-eabi compiler instead of an arm-elf compiler and it should work. -- Daniel Jacobowitz CodeSourcery, LLC
Re: Successfull build of gcc-4.0.2 on mips-sgi-irix6.5
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Albert Chin schrieb: > On Wed, Oct 12, 2005 at 02:29:56PM +0200, Rainer Emrich wrote: > >>Compiler version: 4.0.2 >>Platform: mips-sgi-irix6.5 >>configure flags: >>- - --prefix=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install >>- - --with-gnu-as >>- - --with-as=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install/bin/as >>- - --with-gnu-ld >>- - --with-ld=/SCRATCH/gcc-build/IRIX64/mips-sgi-irix6.5/install/bin/ld >>- - --disable-shared --enable-threads=posix --enable-haifa --disable-nls >>- - --disable-libmudflap --with-gmp=/appl/shared/gnu/IRIX64/mips-sgi-irix6.5 >>- - --with-mpfr=/appl/shared/gnu/IRIX64/mips-sgi-irix6.5 >>- - --enable-languages=c,ada,c++,f95,objc > > > Can the resulting GCC build Emacs _and_ XEmacs? > Didn't check. Perhaps I have the time to check at the beginning of November. Rainer - -- Rainer Emrich TECOSIM GmbH Im Eichsfeld 3 65428 Rüsselsheim Phone: +49(0)6142/8272 12 Mobile: +49(0)163/56 949 20 Fax.: +49(0)6142/8272 49 Web: www.tecosim.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDT84i3s6elE6CYeURAvb6AKCKOZd9teFV+hdEGaOx0XqkvqmQWwCdEL2P kekyVEwvbs+G8S03Vl8v3ZU= =zYeW -END PGP SIGNATURE-
Re: Passing va_args...
obviously, no. __VA_ARGS__ is a identifier for variadic macros. I'am looking for some way to pass variable arguments to another function that receives variable arguments without using va_list. On 10/14/05, Jairo Balart <[EMAIL PROTECTED]> wrote: > are you looking for __VA_ARGS__? > > Regards, > Jairo > > On Friday 14 October 2005 15:19, Kalaky wrote: > > Hello, > > > > Once I saw a gcc macro that passes variables arguments to another > > variable argument function..example: > > > > function_1 (int z, ...); > > function_2 (int z, ...); > > { > > return function_1 (z, MACRO); > > } > > > > Does anyone remember the macro name ? > > > > TIA >
Re: Passing va_args...
Kalaky <[EMAIL PROTECTED]> writes: > I'am looking for some way to pass variable arguments to another > function that receives variable arguments without using va_list. This is impossible. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: Passing va_args...
I'am looking for some way to pass variable arguments to another function that receives variable arguments without using va_list. This is impossible. USL C has a very neat construct called '&...' which was designed for exactly this purpose. One day when I have idle cycles (yeah right) I will look into adding this as an extension. Kean
bitmaps in gcc
In reference to this on the wiki. Bitmaps, also called sparse bit sets, are implemented using a linked list with a cache. This is probably not the most time-efficient representation, and it is not unusual for bitmap functions to show up high on the execution profile. Bitmaps are used for many things, such as for live register sets at the entry and exit of basic blocks in RTL, or for a great number of data flow problems. See bitmap.c (and sbitmap.c for GCC's simple bitmap implementation). Can someone point me to a testcase where bitmap functions show up high on the profile? Can anyone give me some background on the use of bitmaps in gcc? Are they assumed to be sparse? How critical is the memory consumption of bitsets? What operations are the most speed critical? Would it be desirable to merge bitmap and sbitmap into one datastructure? Anyone have good ideas for improvements? Anything else anyone would want to add? I think I may take a look at this. Once I figure out the requirments maybe we can speed it up a bit. Brian N. Makin __ Start your day with Yahoo! - Make it your home page! http://www.yahoo.com/r/hs
Re: bitmaps in gcc
Brian Makin <[EMAIL PROTECTED]> writes: > Bitmaps, also called sparse bit sets, are implemented > using a linked list with a cache. This is probably not > the most time-efficient representation, and it is not > unusual for bitmap functions to show up high on the > execution profile. Bitmaps are used for many things, > such as for live register sets at the entry and exit > of basic blocks in RTL, or for a great number of data > flow problems. See bitmap.c (and sbitmap.c for GCC's > simple bitmap implementation). > > Can someone point me to a testcase where bitmap > functions show up high on the profile? PR 8361. Ian
Severe problems with vectorizing stuff in 4.0.3 HEAD
All, I am getting a lot of test suite failures with almost all of the vect/* tests. I am using pr18400.c from the test suite as an example here, becuase its about the smallest one I can find. Here is what is generated at -O2: .file "pr18400.c" .version"01.01" .text .align 16 .globl sig_ill_handler .type sig_ill_handler, @function sig_ill_handler: pushl %ebp movl%esp, %ebp subl$20, %esp pushl $0 callexit .size sig_ill_handler, .-sig_ill_handler .align 16 .globl check_vect .type check_vect, @function check_vect: pushl %ebp movl%esp, %ebp subl$16, %esp pushl $sig_ill_handler pushl $4 callsignal /APP .byte 0xf2,0x0f,0x10,0xc0 /NO_APP popl%eax popl%edx pushl $0 pushl $4 callsignal addl$16, %esp leave ret .size check_vect, .-check_vect .section.rodata .align 32 .type C.0.1905, @object .size C.0.1905, 32 C.0.1905: .long 0 .long 3 .long 6 .long 9 .long 12 .long 15 .long 18 .long 21 .text .align 16 .globl main1 .type main1, @function main1: pushl %ebp movl$8, %ecx movl%esp, %ebp pushl %edi cld pushl %esi leal-40(%ebp), %edi subl$64, %esp movl$C.0.1905, %esi rep movsl xorl%edx, %edx leal-40(%ebp), %esi leal-72(%ebp), %ecx .align 16 .L6: leal0(,%edx,4), %eax addl$4, %edx cmpl$8, %edx *** At this point, the registers have the following values: *** %eax = 0, %ecx = 0x8047d84, %edx = 4, %ebx = 0x8047dec *** %esi = 0x8047da4, %edi = 0x8047dc4, %ebp = 0x8047dcc *** This is guaranteed to cause a SIGSEGV, and it does, becuase *** %esi is aligned on a 16-byte boundary. But ... see below ... movdqa (%esi,%eax), %xmm0 movdqa %xmm0, (%ecx,%eax) jne .L6 movb$1, %dl .align 16 .L8: movl-4(%ecx,%edx,4), %eax cmpl-4(%esi,%edx,4), %eax jne .L18 incl%edx cmpl$9, %edx jne .L8 addl$64, %esp xorl%eax, %eax popl%esi popl%edi popl%ebp ret .L18: callabort .size main1, .-main1 .align 16 .globl main .type main, @function main: pushl %ebp movl%esp, %ebp pushl %ecx pushl %ecx andl$-16, %esp subl$16, %esp callcheck_vect leave *** Looks like it was trying to align the stack on a 16-byte *** boundary here. But on entry into main1(), its doing 3 *** push's at the beginning. Thus teh offsets into teh stack *** (like the leal -40(%ebp), %edi close to the top of main1) *** appear to be being incorrectly calculated. jmp main1 .size main, .-main .ident "GCC: (GNU) 4.0.3 20051013 (prerelease)" Thats the first problem. I then compiled with -O6, and got this: .file "pr18400.c" .version"01.01" .text .align 16 .globl sig_ill_handler .type sig_ill_handler, @function sig_ill_handler: pushl %ebp movl%esp, %ebp subl$20, %esp pushl $0 callexit .size sig_ill_handler, .-sig_ill_handler .align 16 .globl check_vect .type check_vect, @function check_vect: pushl %ebp movl%esp, %ebp subl$16, %esp pushl $sig_ill_handler pushl $4 callsignal /APP .byte 0xf2,0x0f,0x10,0xc0 /NO_APP popl%eax popl%edx pushl $0 pushl $4 callsignal addl$16, %esp leave ret .size check_vect, .-check_vect .section.rodata .align 32 .type C.0.1905, @object .size C.0.1905, 32 C.0.1905: .long 0 .long 3 .long 6 .long 9 .long 12 .long 15 .long 18 .long 21 .text .align 16 .globl main1 .type main1, @function main1: pushl %ebp movl$8, %ecx movl%esp, %ebp pushl %edi cld pushl %esi leal-40(%ebp), %edi subl$64, %esp movl$C.0.1905, %esi rep movsl xorl%edx, %edx leal-40(%ebp), %esi leal-72(%ebp), %ecx .align 16 .L6: leal0(,%edx,4), %eax addl$4
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
On Oct 14, 2005, at 3:11 PM, Kean Johnston wrote: All, I am getting a lot of test suite failures with almost all of the vect/* tests. I am using pr18400.c from the test suite as an example here, becuase its about the smallest one I can find. Here is what is generated at -O2: Can you just fix your OS instead? -- Pinski
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
On Fri, Oct 14, 2005 at 03:13:55PM -0400, Andrew Pinski wrote: > > On Oct 14, 2005, at 3:11 PM, Kean Johnston wrote: > > >All, > > > >I am getting a lot of test suite failures with almost all of > >the vect/* tests. I am using pr18400.c from the test suite > >as an example here, becuase its about the smallest one I > >can find. Here is what is generated at -O2: > > > Can you just fix your OS instead? Could you possibly give an even less helpful response? Please, make an effort to be polite and helpful if you're going to answer questions. If you're saying that Kean's assumptions about the incoming stack alignment are wrong, why? What should it be? Note, I don't know or care about the answer. I'm just annoyed at your tone. -- Daniel Jacobowitz CodeSourcery, LLC
Re: many libgcc in MacOS X 10.4 build
On Oct 13, 2005, at 8:21 PM, Jack Howarth wrote: What exactly are all of the new libgcc versions created when building the current gcc cvs on MacOS X 10.4. They allow targeting different OS versions with one compiler, from the doc: @item [EMAIL PROTECTED] The earliest version of MacOS X that this executable will run on is @var{version}. Typical values of @var{version} include @code{10.1}, @code{10.2}, and @code{10.3.9}. The default for this option is to make choices that seem to be most useful. The ppc64 one is for 64-bit code...
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
On Oct 14, 2005, at 3:11 PM, Kean Johnston wrote: All, I am getting a lot of test suite failures with almost all of the vect/* tests. I am using pr18400.c from the test suite as an example here, becuase its about the smallest one I can find. Here is what is generated at -O2: Can you try -fno-optimize-sibling-calls and see if that works? If so, then the problem is that we sibling calls should not be done in main. To fix the testcase anyways to be correct is to put "return 0;" after the call to main1. Since right now the return of main could be anything. Thanks, Andrew Pinski
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
Can you just fix your OS instead? My OS is just fine, thank you very much. You disappoint me. I expected better from you. It is most likely of absolutely no consequence to you, and this has nothing to do with GCC so this is the very last I will say on this subject to you, but you really did hurt me with your comment. Hurt *me*, specifically, the person trying to help out with GCC. If you are upset at what other people in my company are doing, I would humbly request you take it up with them and leave me out of it. Thank you. Kean
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
Can you try -fno-optimize-sibling-calls and see if that works? Yes, it did, thank you. If so, then the problem is that we sibling calls should not be done in main. To fix the testcase anyways to be correct is to put "return 0;" after the call to main1. Since right now the return of main could be anything. I don't think the test case is inocrrect. In a nutshell, it has: int main1() { ... stuff ... return 0; } int main(void) { return main1(); } That should be perfectly valid, and return 0 from main1 which subsequently returns 0 from main. What does the fact that -fno-optimize-sibling-calls worked indicate really? Without that option something really does seem to be mis-calculating the stack offsets by 4. What may be of interest here is that aside from the vect/* tests, the only other test that is failing is sibcall-6. Kean
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
On Oct 14, 2005, at 3:43 PM, Kean Johnston wrote: What does the fact that -fno-optimize-sibling-calls worked indicate really? Without that option something really does seem to be mis-calculating the stack offsets by 4. What may be of interest here is that aside from the vect/* tests, the only other test that is failing is sibcall-6. It indicated that sibling calling optimization in main should be disabled for targets that need to up the stack alignment, otherwise you get the stack alignment of a lower one than that is required. You have to look to see what changed between 3.4.0 and 4.0.0 that caused this since it is a regression. I think the issue is that we are detecting them at the tree level but not rejecting them when expanding. So you have to look at the expand functions for that. Also main1 should be marked as noinline to make sure that at -O3 and above, they work correctly. The reason why nobody notices this before is because most x86 OS's now a days align their stack going into main as 16byte aligned which was what my comment about fixing your OS was about, it was more of a joke rather than anything else. sibcall-6 is a different issue, really and unrelated to the current problem you are looking into. -- Pinski
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
On Oct 14, 2005, at 3:55 PM, Andrew Pinski wrote: On Oct 14, 2005, at 3:43 PM, Kean Johnston wrote: What does the fact that -fno-optimize-sibling-calls worked indicate really? Without that option something really does seem to be mis-calculating the stack offsets by 4. What may be of interest here is that aside from the vect/* tests, the only other test that is failing is sibcall-6. It indicated that sibling calling optimization in main should be disabled for targets that need to up the stack alignment, otherwise you get the stack alignment of a lower one than that is required. You have to look to see what changed between 3.4.0 and 4.0.0 that caused this since it is a regression. I think the issue is that we are detecting them at the tree level but not rejecting them when expanding. So you have to look at the expand functions for that. Note I filed PR 24374 for this problem since it is a regression from 3.4.0. Thanks, Andrew Pinski
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
It indicated that sibling calling optimization in main should be disabled for targets that need to up the stack alignment, otherwise you get the stack alignment of a lower one than While that may be true, I think the problem is broader. I took out the main1() function and put it into a separate file, and compiled just that. So now there is no carnal knowledge of main or its stack alignment. The generated code for this stand-alone main1() makes no attempt to align the stack or the stack variables it is going to be passing to the movdqa instruction. Unless thats what you mean by: that is required. You have to look to see what changed between 3.4.0 and 4.0.0 that caused this since it is a regression. I think the issue is that we are detecting them at the tree level but not rejecting them when expanding. So you have to look at the expand functions for that. You're using internals verbiage thats beyond me :) I'm a simple porter, I have very little understanding of the actual internals of GCC. The reason why nobody notices this before is because most x86 OS's now a days align their stack going into main as 16byte aligned which was what my comment about fixing your OS was about, it was more of a joke rather than anything else. Ok I appologise Andrew. I took it as a SCO-bash. My bad. However, I dont think the stack being aligned on a 16-byte boundary into main will help, unless GCC is assuming (and I dont see how it possibly could) that every function would likewise be aligned. The fact that a stand-alone version of main1() was not correctly aligned leads me to believe that the real error is that gcc is not making an attempt to align the stack variables for use by the alignment-sensitive vector insns. Also, when you say "stack going into main is 16 byte aligned", what specifically do you mean? that its 16-byte aligned before the call to main() itself? That at the first insn in main, most likely a push %ebp, its 16-byte aligned (i.e does the call to main from crt1.o have to take the push of the return address into account)? Kean PS, here is the generated assembly for main() as a stand-alone function, nothing else defined in the .c file: .file "foo.c" .version"01.01" .section.rodata .align 32 .type C.0.1458, @object .size C.0.1458, 32 C.0.1458: .long 0 .long 3 .long 6 .long 9 .long 12 .long 15 .long 18 .long 21 .text .align 16 .globl main1 .type main1, @function main1: pushl %ebp movl$8, %ecx movl%esp, %ebp pushl %edi cld pushl %esi leal-40(%ebp), %edi subl$64, %esp movl$C.0.1458, %esi rep movsl xorl%edx, %edx leal-40(%ebp), %esi leal-72(%ebp), %ecx .align 16 .L2: leal0(,%edx,4), %eax addl$4, %edx cmpl$8, %edx movdqa (%esi,%eax), %xmm0 movdqa %xmm0, (%ecx,%eax) jne .L2 movb$1, %dl .align 16 .L4: movl-4(%ecx,%edx,4), %eax cmpl-4(%esi,%edx,4), %eax jne .L14 incl%edx cmpl$9, %edx jne .L4 addl$64, %esp xorl%eax, %eax popl%esi popl%edi popl%ebp ret .L14: callabort .size main1, .-main1 .ident "GCC: (GNU) 4.0.3 20051013 (prerelease)" # cat foo.c #define N 8 int main1 () { int b[N] = {0,3,6,9,12,15,18,21}; int a[N]; int i; for (i = 0; i < N; i++) { a[i] = b[i]; } /* check results: */ for (i = 0; i < N; i++) { if (a[i] != b[i]) abort (); } return 0; }
Re: bitmaps in gcc
On Fri, Oct 14, 2005 at 11:27:15AM -0700, Brian Makin wrote: > Can anyone give me some background on the use of > bitmaps in gcc? There's pleanty of material in the list archives. This is not the first time that this topic has come up. r~
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
On Fri, Oct 14, 2005 at 01:43:03PM -0700, Kean Johnston wrote: > >It indicated that sibling calling optimization in main should > >be disabled for targets that need to up the stack alignment, > >otherwise you get the stack alignment of a lower one than > While that may be true, I think the problem is broader. > > I took out the main1() function and put it into a separate > file, and compiled just that. So now there is no carnal > knowledge of main or its stack alignment. The generated > code for this stand-alone main1() makes no attempt to > align the stack or the stack variables it is going to be > passing to the movdqa instruction. Main is supposed to emit special code to handle aligning the stack, if the OS doesn't do it already. Other functions don't get this treatment. It may be that you need to define something you haven't already for your port if you aren't getting the compensation code in main. -- Daniel Jacobowitz CodeSourcery, LLC
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
On Fri, Oct 14, 2005 at 01:43:03PM -0700, Kean Johnston wrote: > However, I dont think the stack being aligned on a 16-byte > boundary into main will help, unless GCC is assuming (and I > dont see how it possibly could) that every function would > likewise be aligned. Yes, in fact that's *exactly* what GCC is assuming. And it will be true for all code that GCC generates. > Also, when you say "stack going into main is 16 byte aligned", > what specifically do you mean? That the first argument to main is 16-byte aligned. This implies that the return address from main is at addr % 16 == 12. If you're able to fix this in your OS, don't forget to do the same with the pthread entry function and signal handlers. r~
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
On Fri, Oct 14, 2005 at 01:43:03PM -0700, Kean Johnston wrote: > Also, when you say "stack going into main is 16 byte aligned", > what specifically do you mean? that its 16-byte aligned before > the call to main() itself? That at the first insn in main, most > likely a push %ebp, its 16-byte aligned (i.e does the call > to main from crt1.o have to take the push of the return address > into account)? The stack alignment is computed before the saved-EIP is pushed on the stack by a CALL instruction. So on function entry, the ESP has already been decremented by 4 off of its 16-byte alignment. Conventioanlly the EBP is pushed, making the ESP 8 bytes off its 16-byte alignment. If your ABI does not require 16-byte stack frame alignment, aligning it correctly in main() will not fix all the problems unless you can recompile all of the code (and fix all the hand-written assembly) on the entire system. If you're 16-byte aligned and you call into a library that only requires 4-byte alignment (the traditional SysV x86 ABI--Pinski says it's been updated to require 16-byte alignment but I don't know when that happened) and that library function calls into a newly-gcc-recompiled function, you can crash over there because of a misaligned operation. J
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
Yes, in fact that's *exactly* what GCC is assuming. And it will be true for all code that GCC generates. How can that possibly ever work? Is the assumption then that the only code GCC will ever work with is code that GCC compiled? In effect what this implies is that GCC is re-defining the ABI. It also means it is impossible for GCC to inter-operate with vendor supplied libraries like libc. If I use a libc function that has a callback, like ftw() or bsearch() or qsort(), then I cannot have them call a function that was compiled with gcc, becasue no ABI previously defined has made it a requirement for every stack frame to be 16-byte aligned. Kean
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
On Oct 14, 2005, at 4:43 PM, Kean Johnston wrote: It indicated that sibling calling optimization in main should be disabled for targets that need to up the stack alignment, otherwise you get the stack alignment of a lower one than While that may be true, I think the problem is broader. This is the way it is designed. GCC assumes at every entry point except for main, that the stack has been aligned to 16bytes. There was a bug in bugzila about this before. And powers (not me) decided that this was by design and that it is the OS which needs to make sure that the stack is 16byte aligned even though the original SYSV ABI does not require this. Since SYSV ABI does not take into account SSE and it does not even mention them at all. I had wished that someone would have updated the SYSV ABI (like they did for altivec) to talk about SSE/SSE2 and how arguments are passed. Instead GCC just follows what Intel's compiler does (which is known to have changed). Almost all other x86 OS's have changed to take the bigger alignment into account which is why some powers have decided this is by design that we only change the alignment when invoking main. I assume you will have the same issues with threads and their stack alignment. Maybe it is time to update the SYSV abi for SSE and have SCO (and other companies) update their OS for the new ABI. Note: I am not saying that GCC should not be fixed for the case of main which needs to be fixed at least until everyone agrees on an ABI which takes SSE into account. Lucky, x86_64 does not have this trouble. Here is what the current problem with main, we realign the stack to 16byte alignment, we call checking function. Then we leave main (destroying the stack alignment) and then we jump to main1 (this is what is meant by sibling call optimization). The problem is that we should have jumped directly to main1 but made it a call which keep the stack aligned correctly. -- Pinski
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
On Fri, Oct 14, 2005 at 02:14:10PM -0700, Kean Johnston wrote: > How can that possibly ever work? Is the assumption then > that the only code GCC will ever work with is code that > GCC compiled? In effect what this implies is that GCC > is re-defining the ABI. Yes. You can thank Intel for this. With the introduction of SSE1, something had to change in order to satisfy hardware constraints. Intel initially proposed some scheme that performed dynamic stack alignment in functions that use SSE1 instructions, and multiple entry points to avoid redundant realignments. This turned out to be horribly complex, and in many cases resulted in yet another register being unavailable to user code, leaving at times only 4 (!) with pic+alloca+sse. This kind of solution was unacceptably costly, so we simply changed the default code generation scheme to maintain proper stack alignment at all times. GCC code will interoperate with other compilers if you don't use the 128-bit vector modes, but if you do, then we *require* that the stack be maintained aligned. This has been the status quo since 1998 or 1999, and is unlikely to change. > It also means it is impossible > for GCC to inter-operate with vendor supplied libraries > like libc. Yep. Too bad, so sad. If you've users that actually need to use SSE in callbacks from bsearch/qsort or the like, then they'll have to write int __attribute__((noinline)) real_callback(const void *a, const void *b) { ... } int callback(const void *a, const void *b) { void * volatile x = __builtin_alloca(1); return real_callback (a, b); } r~
RE: bitmaps in gcc
One of the classic places that sparse bitmaps were used in GCC in the past is in register allocation phase, where you essentially have a 2D sparse matrix with # of basic blocks on one axis and pseudo register number on the other axis. When you are compiling very large functions, the number of basic blocks and the number of pseudo registers are very large, and if the table wasn't compressed (most registers aren't live past a single basic block) it was very significant. I haven't looked at this area in the last two years or so when I wasn't working on GCC, so it might have changed. Unfortunately I don't recall whether we were using compressed bitmaps before I wrote the original versions of the compressed bit vectors, but the idea was to encapsulate everything within macros so it could be changed in the future. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brian Makin Sent: Friday, October 14, 2005 1:27 PM To: gcc@gcc.gnu.org Subject: bitmaps in gcc In reference to this on the wiki. Bitmaps, also called sparse bit sets, are implemented using a linked list with a cache. This is probably not the most time-efficient representation, and it is not unusual for bitmap functions to show up high on the execution profile. Bitmaps are used for many things, such as for live register sets at the entry and exit of basic blocks in RTL, or for a great number of data flow problems. See bitmap.c (and sbitmap.c for GCC's simple bitmap implementation). Can someone point me to a testcase where bitmap functions show up high on the profile? Can anyone give me some background on the use of bitmaps in gcc? Are they assumed to be sparse? How critical is the memory consumption of bitsets? What operations are the most speed critical? Would it be desirable to merge bitmap and sbitmap into one datastructure? Anyone have good ideas for improvements? Anything else anyone would want to add? I think I may take a look at this. Once I figure out the requirments maybe we can speed it up a bit. Brian N. Makin __ Start your day with Yahoo! - Make it your home page! http://www.yahoo.com/r/hs
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
Yes. You can thank Intel for this. Thank you Intel :) With the introduction of SSE1, something had to change in order to satisfy hardware constraints. Intel initially proposed some scheme that performed dynamic stack alignment in functions that use SSE1 instructions, and multiple entry points to avoid redundant realignments. Ok I am no compiler expert, so this may be totally impossible, and if so I'd appreciate an education, but this is what I instinctively thought of when first thinking about this problem. There are a very limited number of instructions that require 16-byte alignment. The two main places you have to worry about that alignment are when passing arguments to a function, and local stack variables. I guess the compiler is safe to assume that if you are using normal memory, say from a malloc() to hold the alignment-sensitive data that you have done your own alignment. So lets take the first case. You have some code that is going to be passing some vector parameter, something that is alignment-conscious. I am assuming the compiler knows that. Before pushing the data onto the stack, the compiler could arrange things such that those parameters would be neatly aligned on a 16-byte boundary. The only assumption that would need to hold true is that the called function was also compiled with gcc. Using imaginary data types here, where int128_t is alignment-sensitive, suppose we had: int func (int128_t x, int y, int128_t z) { } int otherfunc (void) { int128_t foo = 123; int128_t bar = 234; return func(foo, 0, bar); } When generating teh call to func(), could gcc not align the stack to 16-bytes, such that the first argument is propperly aligned. You then push the next argument, a simple 4-byte , and then re-align to the next 16-byte boundary for the third argument. the code in func() could take this scheme into account, and know the exact offsets into the stack that it needs to get to the args. The second case is where you have function-local variables that are alignment-constrained. In this case, wouldn't simple analysis of the function contents determine whether or not any alignment-specific insns are being used, and if so, to automatically align the stack to 16 bytes if there is. That way, only those functions that actually use such insns pay the (small) penalty of rounding up the stack. If cxourse, all of that could be consitional on targets that don't always align functions on a 16-byte boundary. This seems far less invasive that redefining an ABI. GCC code will interoperate with other compilers if you don't use the 128-bit vector modes, but if you do, then we *require* that the stack be maintained aligned. I think, and I may be wrong here, but I think if I simply make sure that entry to main is correctly aligned, then the majority of code will just work. Assuming I am compiling some program with gcc, if main is correctly aligned, and all gcc code goes to lengths to ensure that alignment, then the only time it can get *out* of alignment is if gcc code has made a call to non-aligned libc code, which in turn makes calls back into gcc code (a la qsort, ftw, etc). Those cases are relatively rare. The only other time it's likely to be an issue is with signal delivery, and I am pretty certain I can persuade the kernel folks to ensure that the stack frame is always aligned to 16 bytes when that happens. Kean
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
On Fri, Oct 14, 2005 at 04:35:48PM -0700, Kean Johnston wrote: > This seems far less invasive that redefining an ABI. No, it isn't less invasive. Your first case is not too difficult. No more difficult, really, than supporting alloca. Indeed, this is more or less exactly the code we emit in main. (Which got bypassed with the tailcall.) Your second case results in a variable displacement between the local stack frame and the incoming function arguments. This means we need an extra register to hold this displacement (or, equivalently, a pointer to the base of the arguments). Combine this with -fpic and alloca and 4 of the 8 general registers are consumed. At this point, lots of code stops compiling. You're coming into this discussion about 6 years late. > I think, and I may be wrong here, but I think if I simply > make sure that entry to main is correctly aligned, then > the majority of code will just work. Yes. Which is exactly why the dynamic alignment solutions were rejected. There's simply not enough gain. > The only other time it's likely to be > an issue is with signal delivery... And the code in the thread libraries that calls the thread start routine. r~
Re: Severe problems with vectorizing stuff in 4.0.3 HEAD
Kean Johnston wrote: Ok I am no compiler expert, so this may be totally impossible, and if so I'd appreciate an education, but this is what I instinctively thought of when first thinking about this problem. Note that never mind SSE1, even conventional 8-byte fpt, while not requiring fpt for correctness, but sure does for efficiency.