Starting an OpenMP parallel section is extremely slow on a hyper-threaded Nehalem

2010-02-11 Thread Edwin Bennink

Dear gcc list,


I noticed that starting an OpenMP parallel section takes a significant 
amount of time on Nehalem cpu's with hyper-threading enabled.

The differences with HTT turned on and off are really huge:

- HTT disabled: about 100.000 parallel sections per second
- HTT enabled: about 15 parallel sections per second

Is this a known problem? It has apparently something to do with setting 
the cpu affinity; when I set the GOMP_CPU_AFFINITY environment variable 
to "0-7", then it is almost as fast as with HTT disabled...



This is the code I used to test it. Simply compile it with -fopenmp. I 
used 100.000 iterations instead of 100 to time it with HTT disabled.




int main () {
   int i;

   for (i = 0; i < 100; i++) {
#pragma omp parallel
   {
   }
   }
}




System specs:
OS: Ubuntu 9.10, amd64 (2.6.31-19-generic)
gcc: version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)
cpu: Intel(R) Core(TM) i7 CPU 920  @ 2.67GHz


Cheers,
Edwin


Re: Starting an OpenMP parallel section is extremely slow on a hyper-threaded Nehalem

2010-02-11 Thread Tim Prince

On 2/11/2010 2:00 AM, Edwin Bennink wrote:

Dear gcc list,


I noticed that starting an OpenMP parallel section takes a significant 
amount of time on Nehalem cpu's with hyper-threading enabled.


If you think a question might be related to gcc, but don't know which 
forum to use, gcc-help is more appropriate.  As your question is whether 
there is a way to avoid anomalous behaviors when an old Ubuntu is run on 
a CPU released after that version of Ubuntu, an Ubuntu forum might be 
more appropriate.  A usual way is to shut off HyperThreading in the BIOS 
when running on a distro which has trouble with it.  I do find your 
observation interesting.
As far as I know, the oldest distro which works well on Core I7 is 
RHEL5.2 x86_64, which I run, with updated gcc and binutils, and HT 
disabled, as I never run applications which could benefit from HT.


--
Tim Prince



Re: Starting an OpenMP parallel section is extremely slow on a hyper-threaded Nehalem

2010-02-11 Thread Edwin Bennink
Thanks Tim, I thought that the gcc list was the most appropriate one 
regarding the gomp implementation, but I'll post this question on the 
gcc-help list.


By the way, Ubuntu 9.10 is the latest version (dd Oct. 2009). HTT works 
fine for daily use, but massive parallel applications show some odd 
behaviour:
Depending on the structure of the algorithm some pieces of code run 
significantly faster (about 10%) with HTT enabled, while other pieces of 
code run slower (some more than 50%). This slowdown happens due to 
parallel sections inside loops...


Edwin


Tim Prince wrote:

On 2/11/2010 2:00 AM, Edwin Bennink wrote:

Dear gcc list,


I noticed that starting an OpenMP parallel section takes a 
significant amount of time on Nehalem cpu's with hyper-threading 
enabled.


If you think a question might be related to gcc, but don't know which 
forum to use, gcc-help is more appropriate.  As your question is 
whether there is a way to avoid anomalous behaviors when an old Ubuntu 
is run on a CPU released after that version of Ubuntu, an Ubuntu forum 
might be more appropriate.  A usual way is to shut off HyperThreading 
in the BIOS when running on a distro which has trouble with it.  I do 
find your observation interesting.
As far as I know, the oldest distro which works well on Core I7 is 
RHEL5.2 x86_64, which I run, with updated gcc and binutils, and HT 
disabled, as I never run applications which could benefit from HT.






RTL question for I64

2010-02-11 Thread Douglas B Rupp

Greetings,

A pointer would be much appreciated!

In ia64.md for *cmpdi_normal this is found:
"cmp.%C1 %0, %I0 = %3, %r2"

Where are %C, %I, %r described?

--Doug


Re: RTL question for I64

2010-02-11 Thread Nathan Froyd
On Thu, Feb 11, 2010 at 09:43:31AM -0800, Douglas B Rupp wrote:
> A pointer would be much appreciated!
>
> In ia64.md for *cmpdi_normal this is found:
> "cmp.%C1 %0, %I0 = %3, %r2"
>
> Where are %C, %I, %r described?

Above gcc/config/ia64/ia64.c:ia64_print_operand.

-Nathan


gcc-4.5-20100211 is now available

2010-02-11 Thread gccadmin
Snapshot gcc-4.5-20100211 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20100211/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 156726

You'll find:

gcc-4.5-20100211.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.5-20100211.tar.bz2 C front end and core compiler

gcc-ada-4.5-20100211.tar.bz2  Ada front end and runtime

gcc-fortran-4.5-20100211.tar.bz2  Fortran front end and runtime

gcc-g++-4.5-20100211.tar.bz2  C++ front end and runtime

gcc-java-4.5-20100211.tar.bz2 Java front end and runtime

gcc-objc-4.5-20100211.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.5-20100211.tar.bz2The GCC testsuite

Diffs from 4.5-20100204 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[C++-0x] Status of constexpr

2010-02-11 Thread Ed Smith-Rowland

Greetings,

I have a patch in my tree that employs the constexpr keyword in most of 
the places in the library where it is required in n3000.  This patch 
bootstraps and causes no new regressions on MacOS at least.  I still 
need test cases.


My question is this: Is constexpr in good enough shape to put this in?  
Some message seemed to suggest it wasn't finished yet.


Thanks,

Ed



Re: Zero extractions and zero extends

2010-02-11 Thread Adam Nemet
Jean Christophe Beyler  writes:
> typedef struct sTestUnsignedChar {
> uint64_t a:1;
> }STestUnsignedChar;
>
> uint64_t getU (STestUnsignedChar a)
> {
> return a.a;
> }
>
>
> I get this in the DCE pass :
> (insn 6 3 7 2 bitfield2.c:8 (set (subreg:DI (reg:QI 75) 0)
> (zero_extract:DI (reg/v:DI 73 [ a ])
> (const_int 1 [0x1])
> (const_int 0 [0x0]))) 63 {extzvdi} (expr_list:REG_DEAD
> (reg/v:DI 73 [ a ])
> (nil)))
>
> (insn 7 6 12 2 bitfield2.c:8 (set (reg:DI 74)
> (zero_extend:DI (reg:QI 75))) 51 {zero_extendqidi2}
> (expr_list:REG_DEAD (reg:QI 75)
> (nil)))
>
>
> (on the x86 port, I get a and instead of the zero_extract)
>
> However, on the combine pass both stay, whereas in the x86 port, the
> zero_extend is removed. Where is this decided exactly ?
> I've checked the costs of the instructions, I have the same thing as
> the x86 port.

Maybe it is turned into an (and:DI .. (const_int 1)) and you don't
recognize that?  Check your combine dump file, that should tell you what
is the pattern that combine came up with while dealing with these two
insns.

Adam