Re: Understanding Scheduling
On 22 Mar 2010, at 17:44, Ian Bolton wrote: >> Enabling BB-reorder only if profile info is available, is not the >> right way to go. The compiler really doesn't place blocks in sane >> places without it -- and it shouldn't have to, either. For example if >> you split an edge at some point, the last thing you want to worry >> about, is where the new basic block is going to end up. >> >> There are actually a few bugs in Bugzilla about BB-reorder, FWIW. > > I've done a few searches in Bugzilla and am not sure if I have found > the BB reorder bugs you are referring to. > > The ones I have found are: > > 16797: Opportunity to remove unnecessary load instructions > 41396: missed space optimization related to basic block reorder > 21002: RTL prologue and basic-block reordering pessimizes delay-slot > filling. > > (If you can recall any others, I'd appreciate hearing of them.) > > Based on 41396, it looks like BB reorder is disabled for -Os. But > you said in your post above that "the compiler really doesn't place > blocks in sane places without it", so does that mean that we could > probably increase performance for -Os if BB reorder was (improved) > and enabled for -Os? Back with our old gcc 3.4 compiler we used to routinely compile our code -Os but with BB reordering enabled as it gave us a significant performance gain for a very small increase in code size (less than 2% code size impact from what I remember versus about a 5% performance win). With gcc 4.4 (where we are until 4.5 is out) I've been constantly frustrated by not being able to do BB reordering at -Os but equally our code sizes at -O2 have steadily shrunk so that it's only about 10% larger than -Os if we disable cache-line-aligning functions (but where -O2 performance is often in the range of 15% to 30% faster). I seem to remember some suggestions in the past that we might want something like a -Os2 that would generally optimize for size but would still enable some number of small code size expansions where the performance benefit was large (and BB reordering would be my favourite such case) - that's the optimization setting I'd like to see us use for almost everything at Ubicom. Cheers, Dave
About "STARTING_FRAME_OFFSET" definition
Hi all, Can this "STARTING_FRAME_OFFSET" macro be defined to be a non-constant value ( changes with the "current_function_args_size")? As the target process has "FP+offset" with postive "offset"( stack grows upward, and parameters in stack grows downward), for example, call foo( arg1, arg2, arg3,arg4), after foo's prologue, the stack is like this: < low address || | Incoming arg4 | <-FP || | Incoming arg3 | || | Incoming arg2 | || | Incoming arg1 | <---ARG || | return PC of foo | || | saved regs | || | old FP| || | local var0 | || < high address "STARTING_FRAME_OFFSET" means the offset between FP and the first local variable, in this situation, STARTING_FRAME_OFFSE = current_function_args_size+ size(PC in stack) + size(saved regs) + size(old FP). so, "STARTING_FRAME_OFFSET" depends on the "current_function_args_size", which is a GCC internal variable. Is this stack layout suitable? Thanks! redriver
Test Failures on sparc-rtems not repeatable by hand
Hi, There are a number of failures in my latest run of sparc-rtems4.10 but the ones I have gone back and run the executable by hand actually pass. I have no idea why this is happening and wondered if someone had some insight as to what I should look at next. From gcc.log Executing on host: /users/joel/test-gcc/b-gcc1-sparc/gcc/xgcc -B/users/joel/test-gcc/b-gcc1-sparc/gcc/ /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/snprintf-chk.c /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/snprintf-chk-lib.c /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/lib/main.c gcc_tg.o -w -O0 -DSTACK_SIZE=2048 -isystem /users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib/targ-include -isystem /users/joel/test-gcc/gcc-svn/newlib/libc/include -B/users/joel/test-gcc/install-svn/sparc-rtems4.10/sis/lib/ -specs bsp_specs -qrtems -mcpu=cypress -B/users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib/ -L/users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib /users/joel/test-gcc/b-gcc1-sparc/rtems_gcc_main.o -Wl,-wrap,exit -Wl,-wrap,_exit -Wl,-wrap,main -Wl,-wrap,abort -lm -o /users/joel/test-gcc/b-gcc1-sparc/gcc/testsuite/gcc/snprintf-chk.x0 (timeout = 300) PASS: gcc.c-torture/execute/builtins/snprintf-chk.c compilation, -O0 sparc-rtems4.10-run is /users/joel/test-gcc/install-svn/bin/sparc-rtems4.10-run Running /users/joel/test-gcc/b-gcc1-sparc/gcc/testsuite/gcc/snprintf-chk.x0 for maximum 60 seconds FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O0 So it compiles but apparently fails to run. OK so I run it by hand: $ /users/joel/test-gcc/install-svn/bin/sparc-rtems4.10-run snprintf-chk.x0 *** EXIT code 0 [j...@rtbf64b gcc]$ echo $? 0 Any suggestions on how to track down what is going wrong? Thanks. -- Joel Sherrill, Ph.D. Director of Research& Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available (256) 722-9985
BB reorder forced off for -Os
Is there any reason why BB reorder has been disabled in bb-reorder.c for -Os, such that you can't even turn it on with -freorder-blocks? From what I've heard on this list in recent days, BB reorder gives good performance wins such that most people would still want it on even if it did increase code size a little. Cheers, Ian
[rfc] common location for plugins
For packages of GCC I would like to see a common location where plugins can be installed; currently a path to the plugin has to be given on the command line, which is likely to be different for different installations. What about -fplugin= (without the .so) meaning to search for the plugin in a default location like /plugins for the plugin? -fplugin=.so could also be used, but maybe would be ambiguous looking in the current dir, or the plugin dir. Matthias
Re: [rfc] common location for plugins
Matthias Klose wrote: For packages of GCC I would like to see a common location where plugins can be installed; currently a path to the plugin has to be given on the command line, which is likely to be different for different installations. What about -fplugin= (without the .so) meaning to search for the plugin in a default location like /plugins for the plugin? -fplugin=.so could also be used, but maybe would be ambiguous looking in the current dir, or the plugin dir. I did posted on gcc-patches@ many times a patch providing exactly that feature, but nobody reviewed it. Last post of the patch is http://gcc.gnu.org/ml/gcc-patches/2010-03/msg00644.html Given Matthias wish, I dare ping again that patch Cheers. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: About "STARTING_FRAME_OFFSET" definition
On 03/23/2010 05:55 AM, redriver jiang wrote: > Hi all, > > Can this "STARTING_FRAME_OFFSET" macro be defined to be a non-constant > value ( changes with the "current_function_args_size")? > > As the target process has "FP+offset" with postive "offset"( stack > grows upward, and parameters in stack grows downward), for example, > > call foo( arg1, arg2, arg3,arg4), after foo's prologue, the stack is like > this: > > < low address > || >| Incoming arg4 | <-FP > || >| Incoming arg3 | > || >| Incoming arg2 | >|| >| Incoming arg1 | <---ARG >|| >| return PC of foo | >|| >| saved regs | >|| >| old FP| >|| >| local var0 | >|| > < high address > > "STARTING_FRAME_OFFSET" means the offset between FP and the first > local variable, in this situation, > > STARTING_FRAME_OFFSE = current_function_args_size+ size(PC in stack) + > size(saved regs) + size(old FP). > > so, "STARTING_FRAME_OFFSET" depends on the > "current_function_args_size", which is a GCC internal variable. > > Is this stack layout suitable? It's possible to create this stack layout, yes. STARTING_FRAME_OFFSET doesn't really ought not enter into it, I don't think. What you'll want instead is to have a separate "soft" frame_pointer_rtx and hard_frame_pointer_rtx. Then during register allocation you eliminate from the soft frame pointer to the hard frame pointer with an offset you calculate at that point. There are many examples of this in existing ports, including the i386 port. The reason why you want to handle this via elimination rather than a fixed offset during initial rtl generation is your "saved regs" field there, which of course will vary in size depending on what registers get spilled. So I would begin with STARTING_FRAME_OFFSET=0 and have the soft frame pointer point to "local var0" in your picture. Then your INITIAL_ELIMINATION_OFFSET function would map: ARG_POINTER_REGNUMHARD_FRAME_POINTER_REGNUM = -current_function_args_size FRAME_POINTER_REGNUM HARD_FRAME_POINTER_REGNUM = -(sizeof(saved_regs) + sizeof(FP) + sizeof(return PC) + current_function_args_size) r~
RE: BB reorder forced off for -Os
Does -Os mean "optimize even if it makes things a bit bigger" or does it mean "optimize only to make it smaller"? If the latter then the current behavior would appear to be the correct one. paul > -Original Message- > From: Ian Bolton [mailto:bol...@icerasemi.com] > Sent: Tuesday, March 23, 2010 2:06 PM > To: gcc@gcc.gnu.org > Subject: BB reorder forced off for -Os > > Is there any reason why BB reorder has been disabled > in bb-reorder.c for -Os, such that you can't even > turn it on with -freorder-blocks? > > From what I've heard on this list in recent days, > BB reorder gives good performance wins such that > most people would still want it on even if it did > increase code size a little. > > Cheers, > Ian
Re: Test Failures on sparc-rtems not repeatable by hand
On Tue, Mar 23, 2010 at 10:56 AM, Joel Sherrill wrote: > Hi, > > There are a number of failures in my latest run > of sparc-rtems4.10 but the ones I have gone back > and run the executable by hand actually pass. > I have no idea why this is happening and wondered > if someone had some insight as to what I should > look at next. From gcc.log > > Executing on host: /users/joel/test-gcc/b-gcc1-sparc/gcc/xgcc > -B/users/joel/test-gcc/b-gcc1-sparc/gcc/ > /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/snprintf-chk.c > /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/snprintf-chk-lib.c > /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/lib/main.c > gcc_tg.o -w -O0 -DSTACK_SIZE=2048 -isystem > /users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib/targ-include > -isystem /users/joel/test-gcc/gcc-svn/newlib/libc/include > -B/users/joel/test-gcc/install-svn/sparc-rtems4.10/sis/lib/ -specs bsp_specs > -qrtems -mcpu=cypress > -B/users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib/ > -L/users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib > /users/joel/test-gcc/b-gcc1-sparc/rtems_gcc_main.o -Wl,-wrap,exit > -Wl,-wrap,_exit -Wl,-wrap,main -Wl,-wrap,abort -lm -o > /users/joel/test-gcc/b-gcc1-sparc/gcc/testsuite/gcc/snprintf-chk.x0 > (timeout = 300) > PASS: gcc.c-torture/execute/builtins/snprintf-chk.c compilation, -O0 > sparc-rtems4.10-run is > /users/joel/test-gcc/install-svn/bin/sparc-rtems4.10-run > Running /users/joel/test-gcc/b-gcc1-sparc/gcc/testsuite/gcc/snprintf-chk.x0 > for maximum 60 seconds > FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O0 > > So it compiles but apparently fails to run. OK > so I run it by hand: > > $ /users/joel/test-gcc/install-svn/bin/sparc-rtems4.10-run snprintf-chk.x0 > > *** EXIT code 0 > [j...@rtbf64b gcc]$ echo $? > 0 > > Any suggestions on how to track down what is going wrong? What are /users/joel/test-gcc/b-gcc1-sparc/rtems_gcc_main.o and -Wl,-wrap,main? Is main being replaced by something that doesn't return 0? janis
Re: Test Failures on sparc-rtems not repeatable by hand
On 03/23/2010 03:01 PM, Janis Johnson wrote: On Tue, Mar 23, 2010 at 10:56 AM, Joel Sherrill wrote: Hi, There are a number of failures in my latest run of sparc-rtems4.10 but the ones I have gone back and run the executable by hand actually pass. I have no idea why this is happening and wondered if someone had some insight as to what I should look at next. From gcc.log Executing on host: /users/joel/test-gcc/b-gcc1-sparc/gcc/xgcc -B/users/joel/test-gcc/b-gcc1-sparc/gcc/ /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/snprintf-chk.c /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/snprintf-chk-lib.c /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/lib/main.c gcc_tg.o -w -O0 -DSTACK_SIZE=2048 -isystem /users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib/targ-include -isystem /users/joel/test-gcc/gcc-svn/newlib/libc/include -B/users/joel/test-gcc/install-svn/sparc-rtems4.10/sis/lib/ -specs bsp_specs -qrtems -mcpu=cypress -B/users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib/ -L/users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib /users/joel/test-gcc/b-gcc1-sparc/rtems_gcc_main.o -Wl,-wrap,exit -Wl,-wrap,_exit -Wl,-wrap,main -Wl,-wrap,abort -lm -o /users/joel/test-gcc/b-gcc1-sparc/gcc/testsuite/gcc/snprintf-chk.x0 (timeout = 300) PASS: gcc.c-torture/execute/builtins/snprintf-chk.c compilation, -O0 sparc-rtems4.10-run is /users/joel/test-gcc/install-svn/bin/sparc-rtems4.10-run Running /users/joel/test-gcc/b-gcc1-sparc/gcc/testsuite/gcc/snprintf-chk.x0 for maximum 60 seconds FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O0 So it compiles but apparently fails to run. OK so I run it by hand: $ /users/joel/test-gcc/install-svn/bin/sparc-rtems4.10-run snprintf-chk.x0 *** EXIT code 0 [j...@rtbf64b gcc]$ echo $? 0 Any suggestions on how to track down what is going wrong? What are /users/joel/test-gcc/b-gcc1-sparc/rtems_gcc_main.o and -Wl,-wrap,main? Is main being replaced by something that doesn't return 0? rtems_gcc_main.c is a file with the RTEMS OS configuration in it. It specifies to start running the user program at main, the stack size, etc. It does not contain main() -- just a pointer to main(). We don't return a value. We set this for the simulator. set_board_info needs_status_wrapper 1 Which turns on the linker option to wrap main() and enables some standard support code to for the main wrapper to take the exit code and print it stdout. Or at least that's what it used to do. Is there anyway to get some visibility as to why the scripting thinks it is failing? --joel janis -- Joel Sherrill, Ph.D. Director of Research& Development joel.sherr...@oarcorp.comOn-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available (256) 722-9985
Re: BB reorder forced off for -Os
> From: Ian Bolton [mailto:bol...@icerasemi.com] > > Is there any reason why BB reorder has been disabled > > in bb-reorder.c for -Os, such that you can't even > > turn it on with -freorder-blocks? On Tue, Mar 23, 2010 at 12:21:05PM -0700, Paul Koning wrote: > Does -Os mean "optimize even if it makes things a bit bigger" or does it > mean "optimize only to make it smaller"? If the latter then the current > behavior would appear to be the correct one. The intent of -Os is to say that speed matters less than size. This would argue against using any optimization that can increase code size *by default*. However, if the user explicitly says -freorder-blocks on the command line, then he/she is overriding part of -Os, saying that desired behavior is to do the specified optimization, but otherwise optimize for space. Also, while some combinations of options might not be possible, at the least, if a user asks for some pass to run with an -f switch and the pass isn't run, there should at least be a warning to that effect (IMHO).
Re: Test Failures on sparc-rtems not repeatable by hand
On Tue, Mar 23, 2010 at 1:20 PM, Joel Sherrill wrote: > On 03/23/2010 03:01 PM, Janis Johnson wrote: >> >> On Tue, Mar 23, 2010 at 10:56 AM, Joel Sherrill >> wrote: >> >>> >>> Hi, >>> >>> There are a number of failures in my latest run >>> of sparc-rtems4.10 but the ones I have gone back >>> and run the executable by hand actually pass. >>> I have no idea why this is happening and wondered >>> if someone had some insight as to what I should >>> look at next. From gcc.log >>> >>> Executing on host: /users/joel/test-gcc/b-gcc1-sparc/gcc/xgcc >>> -B/users/joel/test-gcc/b-gcc1-sparc/gcc/ >>> >>> /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/snprintf-chk.c >>> >>> /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/snprintf-chk-lib.c >>> >>> /users/joel/test-gcc/gcc-svn/gcc/testsuite/gcc.c-torture/execute/builtins/lib/main.c >>> gcc_tg.o -w -O0 -DSTACK_SIZE=2048 -isystem >>> /users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib/targ-include >>> -isystem /users/joel/test-gcc/gcc-svn/newlib/libc/include >>> -B/users/joel/test-gcc/install-svn/sparc-rtems4.10/sis/lib/ -specs >>> bsp_specs >>> -qrtems -mcpu=cypress >>> -B/users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib/ >>> -L/users/joel/test-gcc/b-gcc1-sparc/sparc-rtems4.10/./newlib >>> /users/joel/test-gcc/b-gcc1-sparc/rtems_gcc_main.o -Wl,-wrap,exit >>> -Wl,-wrap,_exit -Wl,-wrap,main -Wl,-wrap,abort -lm -o >>> /users/joel/test-gcc/b-gcc1-sparc/gcc/testsuite/gcc/snprintf-chk.x0 >>> (timeout = 300) >>> PASS: gcc.c-torture/execute/builtins/snprintf-chk.c compilation, -O0 >>> sparc-rtems4.10-run is >>> /users/joel/test-gcc/install-svn/bin/sparc-rtems4.10-run >>> Running >>> /users/joel/test-gcc/b-gcc1-sparc/gcc/testsuite/gcc/snprintf-chk.x0 >>> for maximum 60 seconds >>> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O0 >>> >>> So it compiles but apparently fails to run. OK >>> so I run it by hand: >>> >>> $ /users/joel/test-gcc/install-svn/bin/sparc-rtems4.10-run >>> snprintf-chk.x0 >>> >>> *** EXIT code 0 >>> [j...@rtbf64b gcc]$ echo $? >>> 0 >>> >>> Any suggestions on how to track down what is going wrong? >>> >> >> What are /users/joel/test-gcc/b-gcc1-sparc/rtems_gcc_main.o and >> -Wl,-wrap,main? Is main being replaced by something that doesn't >> return 0? >> > > rtems_gcc_main.c is a file with the RTEMS OS configuration in it. > It specifies to start running the user program at main, the stack > size, etc. It does not contain main() -- just a pointer to main(). > > We don't return a value. We set this for the simulator. > > set_board_info needs_status_wrapper 1 > > Which turns on the linker option to wrap main() and > enables some standard support code to for the main > wrapper to take the exit code and print it stdout. > > Or at least that's what it used to do. > > Is there anyway to get some visibility as to why the > scripting thinks it is failing? You'll need to add some messages to DejaGnu's procedures in the .exp files in share/dejagnu, under where DejaGnu is installed on your system. I don't know much about the ones used for running on simulators or remote systems, it's all pretty awful. Janis
Re: How to get the Tree ARRAY_TYPE declaration size
Thank you very much! :)
Re: BB reorder forced off for -Os
On Tue, Mar 23, 2010 at 7:05 PM, Ian Bolton wrote: > Is there any reason why BB reorder has been disabled > in bb-reorder.c for -Os, such that you can't even > turn it on with -freorder-blocks? No, you should have the option to turn it on if you wish to do so. If that is not possible, I consider this a bug. If you open a PR and assign it to me, I'll look into it. > From what I've heard on this list in recent days, > BB reorder gives good performance wins such that > most people would still want it on even if it did > increase code size a little. FWIW there is already a PR with a request to add heuristics for BB-reorder to optimize for size. Ciao! Steven
gcc-4.4-20100323 is now available
Snapshot gcc-4.4-20100323 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20100323/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 157681 You'll find: gcc-4.4-20100323.tar.bz2 Complete GCC (includes all of below) gcc-core-4.4-20100323.tar.bz2 C front end and core compiler gcc-ada-4.4-20100323.tar.bz2 Ada front end and runtime gcc-fortran-4.4-20100323.tar.bz2 Fortran front end and runtime gcc-g++-4.4-20100323.tar.bz2 C++ front end and runtime gcc-java-4.4-20100323.tar.bz2 Java front end and runtime gcc-objc-4.4-20100323.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.4-20100323.tar.bz2The GCC testsuite Diffs from 4.4-20100316 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Ask for suggestions on init_caller_save
I'm fixing a bug. It's caused by uninitialized caller save pass data. One function in the test case uses the "optimize" attribute with "O2" option. So even with -O0 in command line, GCC calls caller save pass for that function. The problem is init_caller_save is called in backend_inti_target if flag_caller_saves is set. Apparently, in this case, flag_caller_saves is not set when came to backend_inti_target. I think there are several ways to fix this bug, but I don't know which way should/can I go: 1. Always call init_caller_save in backend_inti_target. But it seems a waste for most cases if -O0. 2. Call init_caller_save in IRA main function. But by this way it will be called multiple times unless we create a flag to remember if it has been called or not. Maybe we can reuse test_reg or test_mem. If they are NULL_TREE, just call init_caller_save. 3. Call init_caller_save in handle_optimize_attribute. If flag_caller_saves is not set before parse_optimize_options but set after, call init_caller_save. Considering there might be multiple functions using optimize attribute, we also need a flag to remember if init_caller_save has been called or not. 4. There are only three global function in caller-save.c: init_save_areas, setup_save_areas, and save_call_clobbered_regs. We can just add a check in the beginning of those functions. If the data has not been initialized, just init_caller_save first. Any suggestions? Thanks in advance. -- Jie Zhang CodeSourcery (650) 331-3385 x735
Compiler option for SSE4
Hi all, I'm using GCC 4.1.2 20070626 on a server with Intel Xeon X5570. How do I turn on the compiler option for SSE4? I've tried -msse4, -msse4.1 and -msse4.2, but they all returned the error message cc1: error: unrecognized command line option "-msse4.1" (for whichever option I tried). Thank you. Regards, Weidong
Re: Compiler option for SSE4
On 3/23/2010 11:02 PM, Rayne wrote: I'm using GCC 4.1.2 20070626 on a server with Intel Xeon X5570. How do I turn on the compiler option for SSE4? I've tried -msse4, -msse4.1 and -msse4.2, but they all returned the error message cc1: error: unrecognized command line option "-msse4.1" (for whichever option I tried). You would need a gcc version which supports sse4. As you said yourself, your version is approaching 3 years old. Actually, the more important option for Xeon 55xx, if you are vectorizing, is the -mtune=barcelona, which has been supported for about 2 years. Whether vectorizing or not, on an 8 core CPU, the OpenMP introduced in gcc 4.2 would be useful. This looks like a gcc-help mail list question, which is where you should submit any follow-up. -- Tim Prince