Re: gomp slowness

2007-11-03 Thread skaller
On Sat, 2007-11-03 at 10:35 +0100, Sylvain Pion wrote: > skaller wrote : > > I can tell you I definitely considered using FS for the > > Felix thread frame pointer to save passing that pointer > > between every function.. > > But then, won't you end up with an implementation very similar > to __

Re: gomp slowness

2007-11-03 Thread Sylvain Pion
skaller wrote : I can tell you I definitely considered using FS for the Felix thread frame pointer to save passing that pointer between every function.. But then, won't you end up with an implementation very similar to __thread?? -- Sylvain Pion INRIA Sophia-Antipolis Geometrica Project-Team

Re: gomp slowness

2007-11-03 Thread Jakub Jelinek
On Fri, Nov 02, 2007 at 11:09:33PM -0700, Ian Lance Taylor wrote: > skaller <[EMAIL PROTECTED]> writes: > > > > As I said before, the register is only stolen for code which actually > > > uses TLS. > > > > So scanning that document, for x86_64, fs is used in startup > > code, presumably if, and o

Re: gomp slowness

2007-11-02 Thread Ian Lance Taylor
skaller <[EMAIL PROTECTED]> writes: > > As I said before, the register is only stolen for code which actually > > uses TLS. > > So scanning that document, for x86_64, fs is used in startup > code, presumably if, and only if, there is a linker section > containing __thread variables? Yes. Ian

Re: gomp slowness

2007-11-02 Thread skaller
On Fri, 2007-11-02 at 22:35 -0700, Ian Lance Taylor wrote: > skaller <[EMAIL PROTECTED]> writes: > > > Neko, for example, uses a register. AFAIK MLton does the > > same kind of thing. If gcc team thinks ANY register is free > > to steal they'd be wrong -- that doesn't mean it shouldn't > > be use

Re: gomp slowness

2007-11-02 Thread skaller
On Fri, 2007-11-02 at 23:56 -0400, Robert Dewar wrote: > skaller wrote: > You really can't be serious in your comment about fs, if you > understand the architecture ... You're just not thinking the same way I am. A CPU has state, the compiler and application program manage that state. If the co

Re: gomp slowness

2007-11-02 Thread Ian Lance Taylor
skaller <[EMAIL PROTECTED]> writes: > Neko, for example, uses a register. AFAIK MLton does the > same kind of thing. If gcc team thinks ANY register is free > to steal they'd be wrong -- that doesn't mean it shouldn't > be used, just that it definitely is NOT free. To be clear, it is not the gcc

Re: gomp slowness

2007-11-02 Thread skaller
On Fri, 2007-11-02 at 23:54 -0400, Robert Dewar wrote: > skaller wrote: > > On Fri, 2007-11-02 at 18:45 -0700, Andrew Pinski wrote: > >>> This is not true. If you use a register for any purpose like this, > >>> it can't be used for anything else and that has a cost. > >> This is a segment register

Re: gomp slowness

2007-11-02 Thread Robert Dewar
skaller wrote: This is not true. If you use a register for any purpose like this, it can't be used for anything else and that has a cost. On x86_64 which I use, every register is valuable. Don't you dare take one away, it would have a serious performance impact AND it would stop ME using that r

Re: gomp slowness

2007-11-02 Thread Robert Dewar
skaller wrote: On Fri, 2007-11-02 at 18:45 -0700, Andrew Pinski wrote: This is not true. If you use a register for any purpose like this, it can't be used for anything else and that has a cost. This is a segment register. Please go and read about what segment registers. I know how the x86 wo

Re: gomp slowness

2007-11-02 Thread skaller
On Fri, 2007-11-02 at 18:45 -0700, Andrew Pinski wrote: > > This is not true. If you use a register for any purpose like this, > > it can't be used for anything else and that has a cost. > > This is a segment register. Please go and read about what segment > registers. I know how the x86 works

Re: gomp slowness

2007-11-02 Thread skaller
On Sat, 2007-11-03 at 12:27 +1100, skaller wrote: > On Fri, 2007-11-02 at 10:29 -0700, Ian Lance Taylor wrote: > Of course there is. It's called design by contract. > I do it all the time. I am appalled at code bases like > GTK and interfaces like OpenMP which get such really > basic things wrong

Re: gomp slowness

2007-11-02 Thread skaller
On Fri, 2007-11-02 at 13:47 -0600, Joel Dice wrote: > > So any (application) program needing TLS (other than the stack) > > is automatically badly designed. I've been writing code for > > three decades without using any global variables, ever since > > I learned about re-entrancy. > > While I a

Re: gomp slowness

2007-11-02 Thread Ian Lance Taylor
skaller <[EMAIL PROTECTED]> writes: > On Fri, 2007-11-02 at 10:29 -0700, Ian Lance Taylor wrote: > > skaller <[EMAIL PROTECTED]> writes: > > > > > On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote: > > > > skaller <[EMAIL PROTECTED]> writes: > > > > > > > In a C executable, TLS requires

Re: gomp slowness

2007-11-02 Thread skaller
On Fri, 2007-11-02 at 15:31 -0400, Robert Dewar wrote: > Olivier Galibert wrote: > There are lots of cases where global thread specific variables > are useful in practice, ask anyone who has programmed real world > large scale real time embedded programs. No. And I have done just that myself. Th

Re: gomp slowness

2007-11-02 Thread Andrew Pinski
> This is not true. If you use a register for any purpose like this, > it can't be used for anything else and that has a cost. This is a segment register. Please go and read about what segment registers. They are not real registers and cannot be used for anything except memory accesses. They da

Re: gomp slowness

2007-11-02 Thread skaller
On Fri, 2007-11-02 at 20:00 +0100, Olivier Galibert wrote: > On Sat, Nov 03, 2007 at 03:38:51AM +1100, skaller wrote: > > My argument is basically: there is no need for any such > > feature in a well written program. Each thread already has > > its own local stack. Global variables should not be u

Re: gomp slowness

2007-11-02 Thread skaller
On Fri, 2007-11-02 at 19:56 +0100, Olivier Galibert wrote: > On Sat, Nov 03, 2007 at 03:31:14AM +1100, skaller wrote: > > On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote: > > > I think you need to look at the TLS access code before deciding that > > > it has bad performance. > > > > Yo

Re: gomp slowness

2007-11-02 Thread skaller
On Fri, 2007-11-02 at 10:29 -0700, Ian Lance Taylor wrote: > skaller <[EMAIL PROTECTED]> writes: > > > On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote: > > > skaller <[EMAIL PROTECTED]> writes: > > > > > In a C executable, TLS requires one extra machine register. > > > > You mean gc

Re: gomp slowness

2007-11-02 Thread Joel Dice
On Sat, 3 Nov 2007, skaller wrote: On Fri, 2007-11-02 at 10:46 -0400, Daniel Jacobowitz wrote: On Fri, Nov 02, 2007 at 07:39:33AM -0700, Ian Lance Taylor wrote: The only way I can interpret your comments is that you are assuming that all TLS is Global Dynamic (e.g., accessed from a dlopen'ed sh

Re: gomp slowness

2007-11-02 Thread Robert Dewar
Olivier Galibert wrote: On Sat, Nov 03, 2007 at 03:38:51AM +1100, skaller wrote: My argument is basically: there is no need for any such feature in a well written program. Each thread already has its own local stack. Global variables should not be used in the first place (except for signals etc

Re: gomp slowness

2007-11-02 Thread Olivier Galibert
On Sat, Nov 03, 2007 at 03:38:51AM +1100, skaller wrote: > My argument is basically: there is no need for any such > feature in a well written program. Each thread already has > its own local stack. Global variables should not be used > in the first place (except for signals etc where > there is no

Re: gomp slowness

2007-11-02 Thread Olivier Galibert
On Sat, Nov 03, 2007 at 03:31:14AM +1100, skaller wrote: > On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote: > > I think you need to look at the TLS access code before deciding that > > it has bad performance. > > You already said it costs a register? That's a REALLY high cost > to pay t

Re: gomp slowness

2007-11-02 Thread Ian Lance Taylor
skaller <[EMAIL PROTECTED]> writes: > On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote: > > skaller <[EMAIL PROTECTED]> writes: > > > In a C executable, TLS requires one extra machine register. > > You mean gcc? I don't understand the question. I mean in a C/C++ executable which use

Re: gomp slowness

2007-11-02 Thread skaller
On Fri, 2007-11-02 at 10:46 -0400, Daniel Jacobowitz wrote: > On Fri, Nov 02, 2007 at 07:39:33AM -0700, Ian Lance Taylor wrote: > > The only way I can interpret your comments is that you are assuming > > that all TLS is Global Dynamic (e.g., accessed from a dlopen'ed shared > > library). But stac

Re: gomp slowness

2007-11-02 Thread skaller
On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote: > skaller <[EMAIL PROTECTED]> writes: > In a C executable, TLS requires one extra machine register. You mean gcc? > TLS > variables are accessed via offsets from that register. So what's the > significant difference between that and

Re: gomp slowness

2007-11-02 Thread Daniel Jacobowitz
On Fri, Nov 02, 2007 at 07:39:33AM -0700, Ian Lance Taylor wrote: > The only way I can interpret your comments is that you are assuming > that all TLS is Global Dynamic (e.g., accessed from a dlopen'ed shared > library). But stack based thread local storage won't work for > dlopen'ed shared librar

Re: gomp slowness

2007-11-02 Thread Ian Lance Taylor
skaller <[EMAIL PROTECTED]> writes: > A really cool (non-Posix) implementation would put TLS globals > on the stack base .. but this does require at least one extra > machine register in languages like C which don't provide > a static display (pointer to parent function). For languages > that do,

Re: gomp slowness

2007-11-02 Thread skaller
On Thu, 2007-11-01 at 21:02 -0700, Gary Funck wrote: > On Thu, Oct 18, 2007 at 11:42:52AM +1000, skaller wrote: > > > > DO you know how thread local variables are handled? > > [Not using Posix TLS I hope .. that would be a disaster] > > Would you please elaborate? Sure .. > What's wrong with

Re: gomp slowness

2007-11-01 Thread Gary Funck
On Thu, Oct 18, 2007 at 11:42:52AM +1000, skaller wrote: > > DO you know how thread local variables are handled? > [Not using Posix TLS I hope .. that would be a disaster] Would you please elaborate? What's wrong with the POSIX TLS implementation? Do you know of any studies? I ask, because we

Re: gomp slowness

2007-10-20 Thread skaller
On Sat, 2007-10-20 at 22:32 +0400, Tomash Brechko wrote: > I'm not sure what OpenMP spec says about default data scope (too lazy > to read through), > but it seems that examples from > http://kallipolis.com/openmp/2.html assume default(private), while GCC > GOMP defaults to shared. In your case

Re: gomp slowness

2007-10-20 Thread Tomash Brechko
I'm not sure what OpenMP spec says about default data scope (too lazy to read through), but it seems that examples from http://kallipolis.com/openmp/2.html assume default(private), while GCC GOMP defaults to shared. In your case, #pragma omp parallel for shared(A, row, col) for (i = k+1; i

RE: gomp slowness

2007-10-18 Thread Dave Korn
On 19 October 2007 02:45, tim prince wrote: > skaller wrote: >> On Thu, 2007-10-18 at 06:00 -0700, Tim Prince wrote: >> >>> skaller wrote: >>> >> >> >>> I don't know of any OpenMP compiler which would correct the nesting of >>> parallel loops in your LU. I have assumed that OpenMP doesn't all

Re: gomp slowness

2007-10-18 Thread tim prince
skaller wrote: On Thu, 2007-10-18 at 06:00 -0700, Tim Prince wrote: skaller wrote: I don't know of any OpenMP compiler which would correct the nesting of parallel loops in your LU. I have assumed that OpenMP doesn't allow such optimization; you have to get it right yourself.

Re: gomp slowness

2007-10-18 Thread skaller
On Thu, 2007-10-18 at 13:04 +0200, Jakub Jelinek wrote: > On Thu, Oct 18, 2007 at 02:47:44PM +1000, skaller wrote: > On LU_mp.c according to oprofile more than 95% of time is spent in the inner > loop, rather than any kind of waiting. On quad core with OMP_NUM_THREADS=4 > all 4 threads eat 99.9%

Re: gomp slowness

2007-10-18 Thread skaller
On Thu, 2007-10-18 at 06:00 -0700, Tim Prince wrote: > skaller wrote: > I don't know of any OpenMP compiler which would correct the nesting of > parallel loops in your LU. I have assumed that OpenMP doesn't allow > such optimization; you have to get it right yourself. Can you explain? This co

Re: gomp slowness

2007-10-18 Thread Tim Prince
skaller wrote: On Thu, 2007-10-18 at 12:02 +0800, Biplab Kumar Modak wrote: skaller wrote: On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote: skaller wrote: It would be interesting to try with another compiler. Do you have access to another OpenMP-enabled compiler

Re: gomp slowness

2007-10-18 Thread Jakub Jelinek
On Thu, Oct 18, 2007 at 02:47:44PM +1000, skaller wrote: > > On Thu, 2007-10-18 at 12:02 +0800, Biplab Kumar Modak wrote: > > skaller wrote: > > > On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote: > > >> skaller wrote: > > > > > >> It would be interesting to try with another compiler. Do yo

Re: gomp slowness

2007-10-18 Thread Biplab Kumar Modak
Hi All, I did some tests with GCC-4.2.2 (MinGW build) and the source code provided by skaller. The compilation log is as follows. -- Build: Release in Test --- [ 50.0%] mingw32-gcc.exe -Wall -fexceptions -fopenmp -O2 -IC:\MinGW\include -c C:\Projects\Test\combined_m

Re: gomp slowness

2007-10-18 Thread Biplab Kumar Modak
skaller wrote: OK, attached. Hi skaller, I think I've wasted my money. They do not ship OpenMP headers and libs with Standard Edition. :( Best Regards, Biplab

Re: gomp slowness

2007-10-17 Thread skaller
On Thu, 2007-10-18 at 12:02 +0800, Biplab Kumar Modak wrote: > skaller wrote: > > On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote: > >> skaller wrote: > > > >> It would be interesting to try with another compiler. Do you have access > >> to another OpenMP-enabled compiler? > > > > Unfort

Re: gomp slowness

2007-10-17 Thread Biplab Kumar Modak
skaller wrote: On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote: skaller wrote: It would be interesting to try with another compiler. Do you have access to another OpenMP-enabled compiler? Unfortunately no, unless MSVC++ in VS2005 has openMP. I have an Intel licence but they're too ti

Re: gomp slowness

2007-10-17 Thread Biplab Kumar Modak
Ross Ridge wrote: skaller writes: Unfortunately no, unless MSVC++ in VS2005 has openMP. I don't know if Visual C++ 2005 Express supports OpenMP, but the Professional edition should. Alternatively, the free, as in beer, Microsoft compiler included in the Windows SDK supports OpenMP. Visual

Re: gomp slowness

2007-10-17 Thread skaller
On Thu, 2007-10-18 at 11:18 +1000, skaller wrote: > On Wed, 2007-10-17 at 10:09 -0700, Joe Buck wrote: > > On Thu, Oct 18, 2007 at 03:00:02AM +1000, skaller wrote: > > > Hi, I have just run and timed a couple of tutorial examples for > > > openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a

Re: gomp slowness

2007-10-17 Thread skaller
On Wed, 2007-10-17 at 10:09 -0700, Joe Buck wrote: > On Thu, Oct 18, 2007 at 03:00:02AM +1000, skaller wrote: > > Hi, I have just run and timed a couple of tutorial examples for > > openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core > > Athlon amd64, with OMP_NUM_THREADS set to 1

Re: gomp slowness

2007-10-17 Thread Ross Ridge
skaller writes: > Unfortunately no, unless MSVC++ in VS2005 has openMP. I don't know if Visual C++ 2005 Express supports OpenMP, but the Professional edition should. Alternatively, the free, as in beer, Microsoft compiler included in the Windows SDK supports OpenMP.

Re: gomp slowness

2007-10-17 Thread skaller
On Wed, 2007-10-17 at 10:09 -0700, Joe Buck wrote: > On Thu, Oct 18, 2007 at 03:00:02AM +1000, skaller wrote: > > Hi, I have just run and timed a couple of tutorial examples for > > openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core > > Athlon amd64, with OMP_NUM_THREADS set to 1

Re: gomp slowness

2007-10-17 Thread skaller
On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote: > skaller wrote: > It would be interesting to try with another compiler. Do you have access > to another OpenMP-enabled compiler? Unfortunately no, unless MSVC++ in VS2005 has openMP. I have an Intel licence but they're too tied up with co

Re: gomp slowness

2007-10-17 Thread Biagio Lucini
skaller wrote: Hi, I have just run and timed a couple of tutorial examples for openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core Athlon amd64, with OMP_NUM_THREADS set to 1 and 2, and occasionally 8 I found that 1 thread outperforms 2 by almost 2:1 on all the examples, and 8 i

Re: gomp slowness

2007-10-17 Thread Joe Buck
On Thu, Oct 18, 2007 at 03:00:02AM +1000, skaller wrote: > Hi, I have just run and timed a couple of tutorial examples for > openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core > Athlon amd64, with OMP_NUM_THREADS set to 1 and 2, and occasionally > 8 I found that 1 thread outperfor

gomp slowness

2007-10-17 Thread skaller
Hi, I have just run and timed a couple of tutorial examples for openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core Athlon amd64, with OMP_NUM_THREADS set to 1 and 2, and occasionally 8 I found that 1 thread outperforms 2 by almost 2:1 on all the examples, and 8 is only fractionall