Starting an OpenMP parallel section is extremely slow on a hyper-threaded Nehalem
Dear gcc list, I noticed that starting an OpenMP parallel section takes a significant amount of time on Nehalem cpu's with hyper-threading enabled. The differences with HTT turned on and off are really huge: - HTT disabled: about 100.000 parallel sections per second - HTT enabled: about 15 parallel sections per second Is this a known problem? It has apparently something to do with setting the cpu affinity; when I set the GOMP_CPU_AFFINITY environment variable to "0-7", then it is almost as fast as with HTT disabled... This is the code I used to test it. Simply compile it with -fopenmp. I used 100.000 iterations instead of 100 to time it with HTT disabled. int main () { int i; for (i = 0; i < 100; i++) { #pragma omp parallel { } } } System specs: OS: Ubuntu 9.10, amd64 (2.6.31-19-generic) gcc: version 4.4.1 (Ubuntu 4.4.1-4ubuntu9) cpu: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz Cheers, Edwin
Re: Starting an OpenMP parallel section is extremely slow on a hyper-threaded Nehalem
On 2/11/2010 2:00 AM, Edwin Bennink wrote: Dear gcc list, I noticed that starting an OpenMP parallel section takes a significant amount of time on Nehalem cpu's with hyper-threading enabled. If you think a question might be related to gcc, but don't know which forum to use, gcc-help is more appropriate. As your question is whether there is a way to avoid anomalous behaviors when an old Ubuntu is run on a CPU released after that version of Ubuntu, an Ubuntu forum might be more appropriate. A usual way is to shut off HyperThreading in the BIOS when running on a distro which has trouble with it. I do find your observation interesting. As far as I know, the oldest distro which works well on Core I7 is RHEL5.2 x86_64, which I run, with updated gcc and binutils, and HT disabled, as I never run applications which could benefit from HT. -- Tim Prince
Re: Starting an OpenMP parallel section is extremely slow on a hyper-threaded Nehalem
Thanks Tim, I thought that the gcc list was the most appropriate one regarding the gomp implementation, but I'll post this question on the gcc-help list. By the way, Ubuntu 9.10 is the latest version (dd Oct. 2009). HTT works fine for daily use, but massive parallel applications show some odd behaviour: Depending on the structure of the algorithm some pieces of code run significantly faster (about 10%) with HTT enabled, while other pieces of code run slower (some more than 50%). This slowdown happens due to parallel sections inside loops... Edwin Tim Prince wrote: On 2/11/2010 2:00 AM, Edwin Bennink wrote: Dear gcc list, I noticed that starting an OpenMP parallel section takes a significant amount of time on Nehalem cpu's with hyper-threading enabled. If you think a question might be related to gcc, but don't know which forum to use, gcc-help is more appropriate. As your question is whether there is a way to avoid anomalous behaviors when an old Ubuntu is run on a CPU released after that version of Ubuntu, an Ubuntu forum might be more appropriate. A usual way is to shut off HyperThreading in the BIOS when running on a distro which has trouble with it. I do find your observation interesting. As far as I know, the oldest distro which works well on Core I7 is RHEL5.2 x86_64, which I run, with updated gcc and binutils, and HT disabled, as I never run applications which could benefit from HT.
RTL question for I64
Greetings, A pointer would be much appreciated! In ia64.md for *cmpdi_normal this is found: "cmp.%C1 %0, %I0 = %3, %r2" Where are %C, %I, %r described? --Doug
Re: RTL question for I64
On Thu, Feb 11, 2010 at 09:43:31AM -0800, Douglas B Rupp wrote: > A pointer would be much appreciated! > > In ia64.md for *cmpdi_normal this is found: > "cmp.%C1 %0, %I0 = %3, %r2" > > Where are %C, %I, %r described? Above gcc/config/ia64/ia64.c:ia64_print_operand. -Nathan
gcc-4.5-20100211 is now available
Snapshot gcc-4.5-20100211 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20100211/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 156726 You'll find: gcc-4.5-20100211.tar.bz2 Complete GCC (includes all of below) gcc-core-4.5-20100211.tar.bz2 C front end and core compiler gcc-ada-4.5-20100211.tar.bz2 Ada front end and runtime gcc-fortran-4.5-20100211.tar.bz2 Fortran front end and runtime gcc-g++-4.5-20100211.tar.bz2 C++ front end and runtime gcc-java-4.5-20100211.tar.bz2 Java front end and runtime gcc-objc-4.5-20100211.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.5-20100211.tar.bz2The GCC testsuite Diffs from 4.5-20100204 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
[C++-0x] Status of constexpr
Greetings, I have a patch in my tree that employs the constexpr keyword in most of the places in the library where it is required in n3000. This patch bootstraps and causes no new regressions on MacOS at least. I still need test cases. My question is this: Is constexpr in good enough shape to put this in? Some message seemed to suggest it wasn't finished yet. Thanks, Ed
Re: Zero extractions and zero extends
Jean Christophe Beyler writes: > typedef struct sTestUnsignedChar { > uint64_t a:1; > }STestUnsignedChar; > > uint64_t getU (STestUnsignedChar a) > { > return a.a; > } > > > I get this in the DCE pass : > (insn 6 3 7 2 bitfield2.c:8 (set (subreg:DI (reg:QI 75) 0) > (zero_extract:DI (reg/v:DI 73 [ a ]) > (const_int 1 [0x1]) > (const_int 0 [0x0]))) 63 {extzvdi} (expr_list:REG_DEAD > (reg/v:DI 73 [ a ]) > (nil))) > > (insn 7 6 12 2 bitfield2.c:8 (set (reg:DI 74) > (zero_extend:DI (reg:QI 75))) 51 {zero_extendqidi2} > (expr_list:REG_DEAD (reg:QI 75) > (nil))) > > > (on the x86 port, I get a and instead of the zero_extract) > > However, on the combine pass both stay, whereas in the x86 port, the > zero_extend is removed. Where is this decided exactly ? > I've checked the costs of the instructions, I have the same thing as > the x86 port. Maybe it is turned into an (and:DI .. (const_int 1)) and you don't recognize that? Check your combine dump file, that should tell you what is the pattern that combine came up with while dealing with these two insns. Adam