On Sat, 2007-11-03 at 10:35 +0100, Sylvain Pion wrote:
> skaller wrote :
> > I can tell you I definitely considered using FS for the
> > Felix thread frame pointer to save passing that pointer
> > between every function..
>
> But then, won't you end up with an implementation very similar
> to __
skaller wrote :
I can tell you I definitely considered using FS for the
Felix thread frame pointer to save passing that pointer
between every function..
But then, won't you end up with an implementation very similar
to __thread??
--
Sylvain Pion
INRIA Sophia-Antipolis
Geometrica Project-Team
On Fri, Nov 02, 2007 at 11:09:33PM -0700, Ian Lance Taylor wrote:
> skaller <[EMAIL PROTECTED]> writes:
>
> > > As I said before, the register is only stolen for code which actually
> > > uses TLS.
> >
> > So scanning that document, for x86_64, fs is used in startup
> > code, presumably if, and o
skaller <[EMAIL PROTECTED]> writes:
> > As I said before, the register is only stolen for code which actually
> > uses TLS.
>
> So scanning that document, for x86_64, fs is used in startup
> code, presumably if, and only if, there is a linker section
> containing __thread variables?
Yes.
Ian
On Fri, 2007-11-02 at 22:35 -0700, Ian Lance Taylor wrote:
> skaller <[EMAIL PROTECTED]> writes:
>
> > Neko, for example, uses a register. AFAIK MLton does the
> > same kind of thing. If gcc team thinks ANY register is free
> > to steal they'd be wrong -- that doesn't mean it shouldn't
> > be use
On Fri, 2007-11-02 at 23:56 -0400, Robert Dewar wrote:
> skaller wrote:
> You really can't be serious in your comment about fs, if you
> understand the architecture ...
You're just not thinking the same way I am. A CPU has state,
the compiler and application program manage that state.
If the co
skaller <[EMAIL PROTECTED]> writes:
> Neko, for example, uses a register. AFAIK MLton does the
> same kind of thing. If gcc team thinks ANY register is free
> to steal they'd be wrong -- that doesn't mean it shouldn't
> be used, just that it definitely is NOT free.
To be clear, it is not the gcc
On Fri, 2007-11-02 at 23:54 -0400, Robert Dewar wrote:
> skaller wrote:
> > On Fri, 2007-11-02 at 18:45 -0700, Andrew Pinski wrote:
> >>> This is not true. If you use a register for any purpose like this,
> >>> it can't be used for anything else and that has a cost.
> >> This is a segment register
skaller wrote:
This is not true. If you use a register for any purpose like this,
it can't be used for anything else and that has a cost.
On x86_64 which I use, every register is valuable. Don't you dare
take one away, it would have a serious performance impact AND
it would stop ME using that r
skaller wrote:
On Fri, 2007-11-02 at 18:45 -0700, Andrew Pinski wrote:
This is not true. If you use a register for any purpose like this,
it can't be used for anything else and that has a cost.
This is a segment register. Please go and read about what segment
registers.
I know how the x86 wo
On Fri, 2007-11-02 at 18:45 -0700, Andrew Pinski wrote:
> > This is not true. If you use a register for any purpose like this,
> > it can't be used for anything else and that has a cost.
>
> This is a segment register. Please go and read about what segment
> registers.
I know how the x86 works
On Sat, 2007-11-03 at 12:27 +1100, skaller wrote:
> On Fri, 2007-11-02 at 10:29 -0700, Ian Lance Taylor wrote:
> Of course there is. It's called design by contract.
> I do it all the time. I am appalled at code bases like
> GTK and interfaces like OpenMP which get such really
> basic things wrong
On Fri, 2007-11-02 at 13:47 -0600, Joel Dice wrote:
> > So any (application) program needing TLS (other than the stack)
> > is automatically badly designed. I've been writing code for
> > three decades without using any global variables, ever since
> > I learned about re-entrancy.
>
> While I a
skaller <[EMAIL PROTECTED]> writes:
> On Fri, 2007-11-02 at 10:29 -0700, Ian Lance Taylor wrote:
> > skaller <[EMAIL PROTECTED]> writes:
> >
> > > On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
> > > > skaller <[EMAIL PROTECTED]> writes:
> > >
> > > > In a C executable, TLS requires
On Fri, 2007-11-02 at 15:31 -0400, Robert Dewar wrote:
> Olivier Galibert wrote:
> There are lots of cases where global thread specific variables
> are useful in practice, ask anyone who has programmed real world
> large scale real time embedded programs.
No. And I have done just that myself. Th
> This is not true. If you use a register for any purpose like this,
> it can't be used for anything else and that has a cost.
This is a segment register. Please go and read about what segment
registers. They are not real registers and cannot be used for
anything except memory accesses. They da
On Fri, 2007-11-02 at 20:00 +0100, Olivier Galibert wrote:
> On Sat, Nov 03, 2007 at 03:38:51AM +1100, skaller wrote:
> > My argument is basically: there is no need for any such
> > feature in a well written program. Each thread already has
> > its own local stack. Global variables should not be u
On Fri, 2007-11-02 at 19:56 +0100, Olivier Galibert wrote:
> On Sat, Nov 03, 2007 at 03:31:14AM +1100, skaller wrote:
> > On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
> > > I think you need to look at the TLS access code before deciding that
> > > it has bad performance.
> >
> > Yo
On Fri, 2007-11-02 at 10:29 -0700, Ian Lance Taylor wrote:
> skaller <[EMAIL PROTECTED]> writes:
>
> > On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
> > > skaller <[EMAIL PROTECTED]> writes:
> >
> > > In a C executable, TLS requires one extra machine register.
> >
> > You mean gc
On Sat, 3 Nov 2007, skaller wrote:
On Fri, 2007-11-02 at 10:46 -0400, Daniel Jacobowitz wrote:
On Fri, Nov 02, 2007 at 07:39:33AM -0700, Ian Lance Taylor wrote:
The only way I can interpret your comments is that you are assuming
that all TLS is Global Dynamic (e.g., accessed from a dlopen'ed sh
Olivier Galibert wrote:
On Sat, Nov 03, 2007 at 03:38:51AM +1100, skaller wrote:
My argument is basically: there is no need for any such
feature in a well written program. Each thread already has
its own local stack. Global variables should not be used
in the first place (except for signals etc
On Sat, Nov 03, 2007 at 03:38:51AM +1100, skaller wrote:
> My argument is basically: there is no need for any such
> feature in a well written program. Each thread already has
> its own local stack. Global variables should not be used
> in the first place (except for signals etc where
> there is no
On Sat, Nov 03, 2007 at 03:31:14AM +1100, skaller wrote:
> On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
> > I think you need to look at the TLS access code before deciding that
> > it has bad performance.
>
> You already said it costs a register? That's a REALLY high cost
> to pay t
skaller <[EMAIL PROTECTED]> writes:
> On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
> > skaller <[EMAIL PROTECTED]> writes:
>
> > In a C executable, TLS requires one extra machine register.
>
> You mean gcc?
I don't understand the question. I mean in a C/C++ executable which
use
On Fri, 2007-11-02 at 10:46 -0400, Daniel Jacobowitz wrote:
> On Fri, Nov 02, 2007 at 07:39:33AM -0700, Ian Lance Taylor wrote:
> > The only way I can interpret your comments is that you are assuming
> > that all TLS is Global Dynamic (e.g., accessed from a dlopen'ed shared
> > library). But stac
On Fri, 2007-11-02 at 07:39 -0700, Ian Lance Taylor wrote:
> skaller <[EMAIL PROTECTED]> writes:
> In a C executable, TLS requires one extra machine register.
You mean gcc?
> TLS
> variables are accessed via offsets from that register. So what's the
> significant difference between that and
On Fri, Nov 02, 2007 at 07:39:33AM -0700, Ian Lance Taylor wrote:
> The only way I can interpret your comments is that you are assuming
> that all TLS is Global Dynamic (e.g., accessed from a dlopen'ed shared
> library). But stack based thread local storage won't work for
> dlopen'ed shared librar
skaller <[EMAIL PROTECTED]> writes:
> A really cool (non-Posix) implementation would put TLS globals
> on the stack base .. but this does require at least one extra
> machine register in languages like C which don't provide
> a static display (pointer to parent function). For languages
> that do,
On Thu, 2007-11-01 at 21:02 -0700, Gary Funck wrote:
> On Thu, Oct 18, 2007 at 11:42:52AM +1000, skaller wrote:
> >
> > DO you know how thread local variables are handled?
> > [Not using Posix TLS I hope .. that would be a disaster]
>
> Would you please elaborate?
Sure ..
> What's wrong with
On Thu, Oct 18, 2007 at 11:42:52AM +1000, skaller wrote:
>
> DO you know how thread local variables are handled?
> [Not using Posix TLS I hope .. that would be a disaster]
Would you please elaborate? What's wrong with the
POSIX TLS implementation? Do you know of any studies?
I ask, because we
On Sat, 2007-10-20 at 22:32 +0400, Tomash Brechko wrote:
> I'm not sure what OpenMP spec says about default data scope (too lazy
> to read through),
> but it seems that examples from
> http://kallipolis.com/openmp/2.html assume default(private), while GCC
> GOMP defaults to shared. In your case
I'm not sure what OpenMP spec says about default data scope (too lazy
to read through), but it seems that examples from
http://kallipolis.com/openmp/2.html assume default(private), while GCC
GOMP defaults to shared. In your case,
#pragma omp parallel for shared(A, row, col)
for (i = k+1; i
On 19 October 2007 02:45, tim prince wrote:
> skaller wrote:
>> On Thu, 2007-10-18 at 06:00 -0700, Tim Prince wrote:
>>
>>> skaller wrote:
>>>
>>
>>
>>> I don't know of any OpenMP compiler which would correct the nesting of
>>> parallel loops in your LU. I have assumed that OpenMP doesn't all
skaller wrote:
On Thu, 2007-10-18 at 06:00 -0700, Tim Prince wrote:
skaller wrote:
I don't know of any OpenMP compiler which would correct the nesting of
parallel loops in your LU. I have assumed that OpenMP doesn't allow
such optimization; you have to get it right yourself.
On Thu, 2007-10-18 at 13:04 +0200, Jakub Jelinek wrote:
> On Thu, Oct 18, 2007 at 02:47:44PM +1000, skaller wrote:
> On LU_mp.c according to oprofile more than 95% of time is spent in the inner
> loop, rather than any kind of waiting. On quad core with OMP_NUM_THREADS=4
> all 4 threads eat 99.9%
On Thu, 2007-10-18 at 06:00 -0700, Tim Prince wrote:
> skaller wrote:
> I don't know of any OpenMP compiler which would correct the nesting of
> parallel loops in your LU. I have assumed that OpenMP doesn't allow
> such optimization; you have to get it right yourself.
Can you explain? This co
skaller wrote:
On Thu, 2007-10-18 at 12:02 +0800, Biplab Kumar Modak wrote:
skaller wrote:
On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote:
skaller wrote:
It would be interesting to try with another compiler. Do you have access
to another OpenMP-enabled compiler
On Thu, Oct 18, 2007 at 02:47:44PM +1000, skaller wrote:
>
> On Thu, 2007-10-18 at 12:02 +0800, Biplab Kumar Modak wrote:
> > skaller wrote:
> > > On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote:
> > >> skaller wrote:
> > >
> > >> It would be interesting to try with another compiler. Do yo
Hi All,
I did some tests with GCC-4.2.2 (MinGW build) and the source code
provided by skaller.
The compilation log is as follows.
-- Build: Release in Test ---
[ 50.0%] mingw32-gcc.exe -Wall -fexceptions -fopenmp -O2
-IC:\MinGW\include -c C:\Projects\Test\combined_m
skaller wrote:
OK, attached.
Hi skaller,
I think I've wasted my money. They do not ship OpenMP headers and libs
with Standard Edition. :(
Best Regards,
Biplab
On Thu, 2007-10-18 at 12:02 +0800, Biplab Kumar Modak wrote:
> skaller wrote:
> > On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote:
> >> skaller wrote:
> >
> >> It would be interesting to try with another compiler. Do you have access
> >> to another OpenMP-enabled compiler?
> >
> > Unfort
skaller wrote:
On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote:
skaller wrote:
It would be interesting to try with another compiler. Do you have access
to another OpenMP-enabled compiler?
Unfortunately no, unless MSVC++ in VS2005 has openMP.
I have an Intel licence but they're too ti
Ross Ridge wrote:
skaller writes:
Unfortunately no, unless MSVC++ in VS2005 has openMP.
I don't know if Visual C++ 2005 Express supports OpenMP, but the
Professional edition should. Alternatively, the free, as in beer,
Microsoft compiler included in the Windows SDK supports OpenMP.
Visual
On Thu, 2007-10-18 at 11:18 +1000, skaller wrote:
> On Wed, 2007-10-17 at 10:09 -0700, Joe Buck wrote:
> > On Thu, Oct 18, 2007 at 03:00:02AM +1000, skaller wrote:
> > > Hi, I have just run and timed a couple of tutorial examples for
> > > openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a
On Wed, 2007-10-17 at 10:09 -0700, Joe Buck wrote:
> On Thu, Oct 18, 2007 at 03:00:02AM +1000, skaller wrote:
> > Hi, I have just run and timed a couple of tutorial examples for
> > openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core
> > Athlon amd64, with OMP_NUM_THREADS set to 1
skaller writes:
> Unfortunately no, unless MSVC++ in VS2005 has openMP.
I don't know if Visual C++ 2005 Express supports OpenMP, but the
Professional edition should. Alternatively, the free, as in beer,
Microsoft compiler included in the Windows SDK supports OpenMP.
On Wed, 2007-10-17 at 10:09 -0700, Joe Buck wrote:
> On Thu, Oct 18, 2007 at 03:00:02AM +1000, skaller wrote:
> > Hi, I have just run and timed a couple of tutorial examples for
> > openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core
> > Athlon amd64, with OMP_NUM_THREADS set to 1
On Wed, 2007-10-17 at 18:14 +0100, Biagio Lucini wrote:
> skaller wrote:
> It would be interesting to try with another compiler. Do you have access
> to another OpenMP-enabled compiler?
Unfortunately no, unless MSVC++ in VS2005 has openMP.
I have an Intel licence but they're too tied up with co
skaller wrote:
Hi, I have just run and timed a couple of tutorial examples for
openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core
Athlon amd64, with OMP_NUM_THREADS set to 1 and 2, and occasionally
8 I found that 1 thread outperforms 2 by almost 2:1 on all the examples,
and 8 i
On Thu, Oct 18, 2007 at 03:00:02AM +1000, skaller wrote:
> Hi, I have just run and timed a couple of tutorial examples for
> openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core
> Athlon amd64, with OMP_NUM_THREADS set to 1 and 2, and occasionally
> 8 I found that 1 thread outperfor
Hi, I have just run and timed a couple of tutorial examples for
openMP using gcc (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) on a dual core
Athlon amd64, with OMP_NUM_THREADS set to 1 and 2, and occasionally
8 I found that 1 thread outperforms 2 by almost 2:1 on all the examples,
and 8 is only fractionall
51 matches
Mail list logo