Re: gomp slowness

2007-10-20 Thread Tomash Brechko
I'm not sure what OpenMP spec says about default data scope (too lazy
to read through), but it seems that examples from
http://kallipolis.com/openmp/2.html assume default(private), while GCC
GOMP defaults to shared.  In your case,

  #pragma omp parallel for shared(A, row, col)
for (i = k+1; i

Optimization of conditional access to globals: thread-unsafe?

2007-10-21 Thread Tomash Brechko
Hello,

I have a question regarding the thread-safeness of a particular GCC
optimization.  I'm sorry if this was already discussed on the list, if
so please provide me with the reference to the previous discussion.

Consider this piece of code:

extern int v;
  
void
f(int set_v)
{
  if (set_v)
v = 1;
}

If f() is called concurrently from several threads, then call to f(1)
should be protected by the mutex.  But do we have to acquire the mutex
for f(0) calls?  I'd say no, why, there's no access to global v in
that case.  But GCC 3.3.4--4.3.0 on i686 with -01 generates the
following:

f:
pushl   %ebp
movl%esp, %ebp
cmpl$0, 8(%ebp)
movl$1, %eax
cmove   v, %eax; load (maybe)
movl%eax, v; store (always)
popl%ebp
ret

Note the last unconditional store to v.  Now, if some thread would
modify v between our load and store (acquiring the mutex first), then
we will overwrite the new value with the old one (and would do that in
a thread-unsafe manner, not acquiring the mutex).

So, do the calls to f(0) require the mutex, or it's a GCC bug?

This very bug was actually already reported for a bit different case,
"Loop IM and other optimizations harmful for -fopenmp"
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31862 ; please ignore my
last comment there, as I no longer sure myself).  But the report was
closed with "UNCONFIRMED" mark, and reasons for that are not quire
clear to me.  I tried to dig into the C99 standard and David
Butenhof's "Programming with POSIX Threads", and didn't find any
indication that call f(0) should be also protected by the mutex.

Here are some pieces from C99:

Sec 3.1 par 3: NOTE 2 "Modify" includes the case where the new value
   being stored is the same as the previous value.

Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access
   objects.

Sec 5.1.2.3 par 3: In the abstract machine, all expressions are
   evaluated as specified by the semantics.

Sec 5.1.2.3 par 5 basically says that the result of the program
execution wrt volatile objects, external files and terminal output
should be the same for all confirming implementations.

Sec 5.1.2.3 par 8: EXAMPLE 1 An implementation might define a
   one-to-one correspondence between abstract and
   actual semantics: ...

Sec 5.1.2.3 par 9: Alternatively, an implementation might perform
   various optimizations within each translation unit,
   such that the actual semantics would agree with the
   abstract semantics only when making function calls
   across translation unit boundaries. ...

I think that the above says that even when compiler chooses to do some
optimizations, the result of the _whole execution_ should be the same
as if actual semantics equals to abstract semantics.  Sec 5.1.2.3 par
9 cited last is not a permission to do optimizations that may change
the end result.  In our case when threads are involved the result may
change, because there's no access to v in the abstract semantics, and
thus no mutex is required from abstract POV.


So, could someone explain me why this GCC optimization is valid, and,
if so, where lies the boundary below which I may safely assume GCC
won't try to store to objects that aren't stored to explicitly during
particular execution path?  Or maybe the named bug report is valid
after all?


Thanks in advance,

-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-21 Thread Tomash Brechko
On Sun, Oct 21, 2007 at 17:26:02 +0200, Erik Trulsson wrote:
> Note that C99 is firmly based on a single-threaded execution model and says
> nothing whatsoever about what should happen or not happen in a threaded
> environment.  According to C99 a C compiler is allowed to generate such code
> as gcc does.

Yes, I understand that C99 doesn't concern threads per see, but I
wouldn't call it pro-single-threaded, rather thread-neutral.  I.e. the
standard isn't made explicitly incompatible with threads, it is
simply "says nothing about threads".


> If you are using some threaded environment then you will have to read the
> relevant standard for that to find out if it imposes any additional
> restricitions on a C compiler beyond what the C standard does.

All we have is POSIX, and it imposes very little on compiler I guess.


> I suspect that most of them will not say one way or the other about what
> should happen in this case, which means that you will have to assume the
> worst case and protect all calls to f() regardless of the value of the
> argument.

Well, assuming the worst case won't always work, that's why I asked
about reasonable boundary.  Consider the following (putting
style/efficiency matters aside):

  #include 

  #define N 100

  /* mutex[i] corresponds to byte[i].  */
  pthread_mutex_t mutex[N];
  char byte[N];

  void
  f(int i)
  {
pthread_mutex_lock(&mutex[i]);
byte[i] = 1;
pthread_mutex_unlock(&mutex[i]);
  }


Is this code thread-safe?  Because from some POV C99 doesn't forbid to
load and store the whole word when single byte[i] is accessed (given
that C99 is pro-single-threaded).

But if C99 is thread-neutral, then it's compiler's responsibility to
ensure the same result as some abstract machine (which may be
sequential).  In this case the compiler should access the single byte,
no more.


OK, I've got your point, but I'm not satisfied :).


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-22 Thread Tomash Brechko
On Mon, Oct 22, 2007 at 00:07:50 +0100, Dave Korn wrote:
>   Because of the 'as-if' rule.  Since the standard is neutral with regard to
> threads, gcc does not have to take them into account when it decides whether
> an optimisation would satisfy the 'as-if' rule.

If this would be true, then the compiler is free to inject the
sequence

  mov mem -> reg
  mov reg -> mem

just _anywhere_.  How the programmer can predict where and when to
lock the mutex to protect mem?  The only thing we could relay on then
is that the compiler is sound, it wouldn't inject such a sequence
unless it really feels so.  But still, how to determine when the
compiler really feels so?

Here's another piece of code, more real and sound this time:


  #include 

  static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
  static int acquires_count = 0;

  int
  trylock()
  {
int res;

res = pthread_mutex_trylock(&mutex);
if (res == 0)
  ++acquires_count;

return res;
  }


Is it thread safe?  Or rather, should the compiler preserve its
thread-safeness, as seen from the programmer's POV?  Otherwise I don't
get how pthread_mutex_trylock() could possibly ever be used, because
it's exactly the case when you _have_ to do the access based on the
condition, "assume the worst" won't work here.  GCC 4.3 with -O1
generates:

  trylock:
  pushl   %ebp
  movl%esp, %ebp
  subl$8, %esp
  movl$mutex, (%esp)
  callpthread_mutex_trylock
  cmpl$1, %eax; test res
  movlacquires_count, %edx; load
  adcl$0, %edx; maybe add 1
  movl    %edx, acquires_count; store
  leave
  ret


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-22 Thread Tomash Brechko
On Mon, Oct 22, 2007 at 11:19:31 +0100, Andrew Haley wrote:
> Please have a read of [1].  Let us know if anything you have observed
> isn't covered in that paper.
> 
> [1] Hans-Juergen Boehm. Threads cannot be implemented as a library. In
> Proc. of the ACM SIGPLAN 2005 Conf. on Programming Language
> Design and Implementation (PLDI), pages 261?268, Chicago, IL, June
> 2005.

Unfortunately I'm not lucky enough to have ACM access.  But from the
Abstract:

  We provide specific arguments that a pure library approach, in which
  the compiler is designed independently of threading issues, cannot
  guarantee correctness of the resulting code.


Can't agree less!  That's why for _practical_ reasons I'd say GCC
should be thread-aware, even if _theoretically_ it doesn't have to.
And AFAIU it already _is_, for the most part of it.  That's why I want
to see Bug#31862 be confirmed, accepted, and fixed.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-22 Thread Tomash Brechko
On Mon, Oct 22, 2007 at 14:50:44 +0400, Tomash Brechko wrote:
> Can't agree less!

Can't agree more!, that's what it was supposed to say, think you've
got it right ;).


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-22 Thread Tomash Brechko
On Mon, Oct 22, 2007 at 11:54:47 +0100, Dave Korn wrote:
> http://www.google.com/search?q=Threads+cannot+be+implemented+as+a+library&sour
> ceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org
> .mozilla:en-GB:official


Thanks!


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-22 Thread Tomash Brechko
On Mon, Oct 22, 2007 at 12:07:20 +0100, Dave Korn wrote:
> And even volatile wouldn't help if the code said
> 
>   if (i > x)
> var += i;
> 
> instead of a simple assignment.  The race in fact *does* exist in the original
> program, but is hidden by the fact that you don't care which of two operations
> that overwrite the previous value complete in which order, but you're assuming
> the operation that modifies var is atomic, and there's nothing to innately
> guarantee that in the original program.  The race condition *is* already
> there.

Why?  For that example, if executed verbatim, it is either i > x
always false, or the mutex is properly acquired.  No one is assuming
atomic update.



-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-22 Thread Tomash Brechko
On Mon, Oct 22, 2007 at 12:08:02 +0100, Andrew Haley wrote:
> Well, that's a big job: you'd have to decide on what a memory model
> really should be, and then implement that model.

Wouldn't the following rule of thumb work?: GCC is allowed to inject
additional store operations on some execution path only if there are
explicit store operations (i.e. issued by the user code if read
verbatim).

The whole problem will vanish if the last store that GCC adds will be
made conditional, like

   if (there_were_explicit_stores_already)
 store;

When execution do not get to basic blocks that have stores, GCC
shouldn't add any.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-22 Thread Tomash Brechko
On Mon, Oct 22, 2007 at 12:19:40 +0100, Dave Korn wrote:
>   *What* mutex are you referring to?  There is no mutex in that code.

I was talking about the code in the comment#7.  For the code in the
comment#1, the piece is simply incomplete.  For it, mutex should be
used if x < 99, not clear if x >= 99.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-22 Thread Tomash Brechko
On Mon, Oct 22, 2007 at 14:53:41 +0100, Dave Korn wrote:
> The optimisation the compiler is making here is a big win in normal
> code, you wouldn't want to disable it unless absolutely necessary;
> to be precise, you wouldn't want to automatically disable it for
> every loop and variable in a program that used -fopenmp just because
> /some/ of the variables in that program couldn't be safely accessed
> that way.

I'd rather wish the optimization would be done differently.  Currently
we have:

 mem -> reg;
   loop  loop
 if (condition)=> optimize =>  if (condition)
   val -> mem;   val -> reg;
 reg -> mem;


But it could use additional register and be:

 0 -> flag_reg;
 loop
   if (condition)
 val -> reg;
 1 -> flag_reg;
 if (flag_reg == 1)
   reg -> mem;


Note that by doing so we also eliminate all memory accesses when they
are not needed (when condition is never true), and memory bandwidth is
a major limiting factor nowadays.  Actually, for the very first code
piece of this thread I'd say that optimization


 mem -> reg;
   if (condition)   => optimize =>   if (condition)
 val -> mem;   val -> reg;
 reg -> mem;

(there's no loop) is actually a counter-optimization even in
single-threaded case: we replace a branch, which surely has its costs,
with unconditional memory load and store, which cost much more.  Even
if branching would flush CPU pipeline even when jump destination is
already in the pipeline (is this the case?), memory load has its own
quite big cost plus the cost of flushing one line from the cache just
to perform single operation on mem.

So, why not use flag_reg and thus make GCC thread-aware for this case?
I read the article suggested by Andrew Haley, its main point is that
the compiler should be made thread-aware.  Making all shared objects
volatile is an overkill, and is more a trick rather than a solution.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-22 Thread Tomash Brechko
On Mon, Oct 22, 2007 at 18:15:35 +0200, Michael Matz wrote:
> > I'd rather wish the optimization would be done differently.  Currently
> > we have:
> > 
> >  mem -> reg;
> >loop  loop
> >  if (condition)=> optimize =>  if (condition)
> >val -> mem;   val -> reg;
> >  reg -> mem;
> > 
> > 
> > But it could use additional register and be:
> > 
> >  0 -> flag_reg;
> >  loop
> >if (condition)
> >  val -> reg;
> >  1 -> flag_reg;
> >  if (flag_reg == 1)
> >reg -> mem;
> 
> That could be done but would be besides the point.  You traded one 
> conditional store with another one, so you've gained nothing in that 
> transformation.

Rather I traded possibly many conditional stores in a loop with one
conditional store outside the loop.  And this exactly coincides with
the point of discussion: you can't go further, when you replace
conditional store with unconditional one, you introduce the race that
wasn't in the original code.

Several people already suggested to use volatile for shared data.
Yes, it will help because we know it will disable all access
optimizations, including thread-unaware ones.  But I don't want to
disable _all_ optimizations, I rather vote for thread-aware
optimizations.  There is no requirement in POSIX to make all shared
data volatile.  As the article referenced in the thread explains,
there is no agreement between POSIX and C/C++ wrt memory access.  But
should it be fixed in the compiler (as article suggests), or should
every shared data in every threaded program be defined volatile, just
for the case?  I never seen latter approach in any Open Source project
(though didn't look for it specifically), and many of them are
considered quite portable.

Again, we are not discussing some particular code sample, and how it
might be fixed, but the problem in general.  Should GCC do
thread-unsafe optimizations, or not?


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-22 Thread Tomash Brechko
On Mon, Oct 22, 2007 at 18:33:37 +0100, Andrew Haley wrote:
> We do understand what you're saying, and simply repeating the same
> thing doesn't help.
> 
> I think we should wait to see what the C++ working group comes up with
> and consider implementing that, rather than some ad-hoc gcc-specific
> proposal.

Aha, but repeating worked.  This is the first time someone agrees that
the problem lies not entirely in the programmer's code.  Thank you!
:))


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-22 Thread Tomash Brechko
On Mon, Oct 22, 2007 at 18:48:02 +0100, Andrew Haley wrote:
> Err, not exactly.  :)
> 
> See http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/why_undef.html

Why, I'd say that page is about original races in the program, not
about what compiler should do with races that it introduces itself.

Still, "let's wait and see" is probably the best outcome that I can
expect from this discussion, so thanks anyway. ;)


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
Hello Bart,

Thanks for the summary.  There are good pointers in this e-mail thread
regarding the current state of the process of defining memory model
for C++ (and eventually for C I guess).

>From those pointers several conclusions may be made (which are in line
with that you said):

  - though neither Standard C nor POSIX require to use volatile, it
seems like you have to use it until the memory model is clearly
defined.

  - the compiler should not introduce speculative stores to the shared
objects.  This is what my original question was about.  I haven't
read all the papers yet, so one thing is still unclear to me: it
seems like atomic variables will be annotated as such
(atomic).  But I found no proposal for annotation of
non-atomic objects that are protected by the ordinary locks (like
mutexes).  Will the compiler be forbiden to do all speculative
stores, or how will it recognize shared objects as such?

  - the compiler should not cross object boundary when doing the store
(i.e. when storing to 8-bit char it should not store to the whole
32/64-bit word).  Here's the same question about shared object
annotation.


Cheers,

-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
On Fri, Oct 26, 2007 at 08:32:07 -0700, Ian Lance Taylor wrote:
> The language standard does not forbid speculative stores to non-atomic
> objects.

That's why there's a proposal to refine the language.  I was meaning
the folloing:

  http://www.artima.com/cppsource/threads_meeting.html:

  Hans Boehm and Herb Sutter both presented very detailed and
  well-thought out memory models. Their differences are subtle and
  important, but in broad strokes, both proposals paint a similar
  picture. In particular, both proposals:

  * Specify a set of atomic (aka, interlocked) primitive operations.
  * Explicitly specify the ordering constraints on atomic reads and writes.
  * Specify the visibility of atomic writes.
  * Disallow speculative stores on potentially shared objects.
  * Disallow reading and re-writing of unrelated objects. (For
instance, if you have struct S{ char a,b; }; it is not OK to
modify b by reading in the whole struct, bit-twiddling b, and
writing the whole struct because that would interfere with
another thread that is trying to write to a.)


So, will "potentially shared objects" be marked as such explicitly by
the programmer, or is it a compiler job to identify them?


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
On Fri, Oct 26, 2007 at 17:00:28 +0100, Dave Korn wrote:
> >   * Disallow speculative stores on potentially shared objects.
> >   * Disallow reading and re-writing of unrelated objects. (For
> > instance, if you have struct S{ char a,b; }; it is not OK to
> > modify b by reading in the whole struct, bit-twiddling b, and
> > writing the whole struct because that would interfere with
> > another thread that is trying to write to a.)
> 
>   I don't see how that second one is possible in the most general case.  Some
> cpus don't have all widths of access mode;

>From http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf:

  Fortunately, the original motivation for this lax specification
  seems to stem from machine architectures that did not support
  byte-wide stores.  To our knowledge, no such architectures are still
  in wide-spread multiprocessor use.


> and how could it possibly work for sub-world bitfields?  (Or are
> those just to be considered 'related'?)

How mutex-protected, or even atomic access to bit-fields could
possibly work?  Yes, they are related, or rather do not constitute a
separate object, but belong to one common.


>   Aren't we about to reinvent -fvolatile, with all the hideous performance
> losses that that implies?

It was already said that instead of disallowing all optimization with
volatile, the optimization itself may be made a bit differently.
Besides, the concern that it will hurt performance at large is a bit
far-stretched.  You still may speculatively store to automatic var for
which address was never taken, and this alone covers 50%--80% of
cases.  Only globals, or locals which address was passed to some
function, should be treated specially.  Also, for the case

  void
  f(int set_v, int *v)
  {
if (set_v)
  *v = 1;
  }

there's no load-maybe_update-store optimization, so there won't be
slowdown for such cases also (BTW, how this case is different from
when v is global?).


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
On Fri, Oct 26, 2007 at 19:04:10 +0200, Michael Matz wrote:
> int f(int M, int *mc, int *mpp, int *tpmm, int *ip, int *tpim, int *dpp,
>   int *tpdm, int xmb, int *bp, int *ms)
> {
>   int k, sc;
>   for (k = 1; k <= M; k++)
> {
>   mc[k] = mpp[k-1]   + tpmm[k-1];
>   if ((sc = ip[k-1]  + tpim[k-1]) > mc[k])  mc[k] = sc;
>   if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;
>   if ((sc = xmb  + bp[k]) > mc[k])  mc[k] = sc;
>   mc[k] += ms[k];
> }
> }

Aha, but the store in this example is _never_ speculative when
concurrency in concerned: you _explicitly_ store to mc[k] anyway, so
you may as well add some stores here and there.  If mc[] shared, it's
programmer's responsibility to protect it with the lock.

When you remove the first and the last lines inside the loop, then all
stores will become conditional.  But only one value will get to mc[k],
so there's no point in making the only store unconditional.  Note that
it doesn't cancel cmoves, as those are loads, not stores.

But look at the whole matter another way: suppose GCC implements some
optimization, really cool one, and users quickly find a lot of uses
for it.  But then it is discovered that this optimization is not
general enough, and in come cases wrong code is produced.  What would
you do?  Remove it?  But users will complain.  Ignore the matter?
Other users will complain.  But you may make it optional, like
-funsafe-math-optimizations or -funsafe-loop-optimizations, and
everyone is happy.

Our situation is a bit different, because 1) speculative store is not
a bug per see, 2) program classes where it can do harm
(mutli-threaded), and where it can not (single-threaded), are clearly
separable.  Alright, not entirely, because we don't know when and how
libraries are used.  But that is the case for -funsafe- options above
too.  Want safe library?  Compile with
-fno-thread-unsafe-optimizations, or specify that any user data
pointers to which are passed to the library should not be shared (at
least during the library call).


> >   void
> >   f(int set_v, int *v)
> >   {
> > if (set_v)
> >   *v = 1;
> >   }
> > 
> > there's no load-maybe_update-store optimization, so there won't be
> > slowdown for such cases also (BTW, how this case is different from
> > when v is global?).
> 
> The difference is, that 'v' might be zero, hence *v could trap, hence it 
> can't be moved out of its control region.  If you somehow could determine 
> that *v can't trap (e.g. by having a dominating access to it already) then 
> the transformation will be done.

Good point.  But how to tell the compiler that it is not NULL?  The
following doesn't work too:

  void
  f(int set_v, int v[1])
  {
if (set_v)
  v[0] = 1;
  }


  void
  g(int set_v, int *v) __attribute__((nonnull));

  void
  g(int set_v, int *v)
  {
if (set_v)
  *v = 1;
  }


Please note that I'm not trying to prove you wrong, just curious about
the reasons why there's no optimization.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
On Fri, Oct 26, 2007 at 21:45:03 +0400, Tomash Brechko wrote:
> Note that it doesn't cancel cmoves, as those are loads, not stores.

I just checked with x86 instruction reference, CMOVcc is reg->reg or
mem->reg, never reg->mem.  You know God's deed when you see it. :)


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
On Sat, Oct 27, 2007 at 03:06:21 +1000, skaller wrote:
> err .. what about the heap??

The heap are objects for which the addresses were taken.  So they can
be shared.  But I haven't yet seen that the optimization we discuss is
being applied to the object accessed though the pointer (see my reply
to Michael Matz).  Maybe this is just a coincidence.

I was beaten already for repeating myself, but please let me do that
once more :).  First, I have a strong believe (though I didn't test
it) that

  if (C)
val->mem;

runs faster than

  mem->reg;
  if (C)
val->reg;
  reg->mem;

(short) jump will cost less then unconditional load/store when they
are not needed (especially the store).

BTW, it would be interesting to measure if short jumps are as bad as
long jumps, i.e. whether CPU pipeline is flushed when jump target is
already in it.


Second, in situation like

  loop
if (C)
  val->mem;

i.e. when there are lots of conditional stores, only one final store
matters.  And current optimization employs this:

  mem->reg;
  loop
if (C)
  val->reg;
  reg->mem;// One final store.

But at the cost of additional register this final store can be made
conditional (there are cases when even that register is not needed,
but that requires thorough analysis of val's possible values, i.e. reg
could be initialized to some "invalid" value and then checked for it).

Registers are a valuable resource, yes.  But so is the correct program
result.  Since GCC is correct wrt all standards, next comes its
usability in not-yet-standardized domains.


> And what do you do if you do not KNOW what the storage class is,
> which is the case 99.99% of the time in C++ member functions?

I'm not quite sure what you mean here.  If extern vs static---that's
of no concern.  What matters is whether the object can possibly be
accessed from another thread, and this has nothing specific to C++.



-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-27 Thread Tomash Brechko
On Sat, Oct 27, 2007 at 09:25:09 +1000, skaller wrote:
> Yes, but with a class:
> 
>   struct X {
>   int x;
>   void f() { if (C) x = 1; }
>   void f2() { reg = x; if (c) reg = 1; x = reg; }
>   };

Hmm, indeed, and the example may end right here, you don't have to
allocate global X.  x member is "shared" among all X member functions,
so if both X::f() and X::f2() are called concurrently for the same
object without the lock, you are in trouble, even if you know only one
of them might modify the x for current conditions.

Since both f() and f2() implicitly get 'this' pointer, the situation
when "the address of some local var is taken" is more frequent then I
thought before, thanks for pointing this.

Then perhaps all unconditional speculative stores should be avoided
(unless there's also explicit unconditional store), without the need
to analize whether it is safe or not.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-28 Thread Tomash Brechko
On Sun, Oct 28, 2007 at 09:47:36 -0400, Robert Dewar wrote:
> Bart Van Assche wrote:
> 
> >Requiring that all thread-shared variables should be declared
> >volatile is completely unacceptable.
> 
> Why is this unacceptable .. seems much better to me than writing
> undefined stuff.

There's a parallel thread in the Linux Kernel Mailing List.  Everyone
is advised to read it, if not already.  There are several good points
there:

  - the problem is not limited to multithreaded domain: the page with
the object could be made read-only during execution, thus

   if (! page_is_read_only)
 v = 1;

would SIGSEGV for no apparent reason.

  - making things volatile is unacceptable from performance POV.

  - optimization in question might well turn out to be misoptimization
for anything but microbenchmarks (read LKML for cache flush/dirty
page issues).

  - "people knowledgeable in POSIX say that this optimization is
bogus".  I would add that though we may say that Standard C is not
aware of threads, POSIX _is_ aware of Standard C.  While POSIX
failed to solve the issue by formal word, its intent is clear: to
make POSIX Threads usable.  The compiler that claims to be POSIX
compatible should take this into account.

  - there's also a good talk on lawyer-ish vs attached-to-reality
approach.  I personally doubt those who continue to advise to use
volatile are actually writing such multithreaded programs.  Most
argue just for the fun of it.


> Well Hans is talking about C/C++, you are talking about some other
> language in which programs which do not have well defined semantics
> in C or C++ do have well defined semantics in your language.

Good thing we have this _bug_ in languages that define memory
semantics (Ada, Java), and no one yet argues that GCC should be fixed
wrt to only those languages.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-28 Thread Tomash Brechko
On Sun, Oct 28, 2007 at 17:51:57 +0100, Michael Matz wrote:
> I was merely showing that this transformation _does_ matter in some cases 
> to refute opposite claims which seemed to be expressed too airy in this 
> thread.

You got my intent all wrong.  Performance matters for both sides.  And
currently the only option for multithreaded programs is to use
volatile, which _greatly_ hurts performance.

What I was trying to say, is that it would be nice to have
-fno-thread-unsafe-optimization option.  And I was trying to say that
when you _enable_ this option, the performance won't be hurt much,
while the program will become thread-safe.  I never even said that
this option should be the default (though it would be reasonable for
-pthread or -fopenmp).  But there are obviously people who think
there's no need in such option whatsoever, because "threaded code is
broken by definition, and I don't write it anyway".

Even if mutithreading is of no immediate concern for you, it will
become tomorrow then you decide to run your loop on all 1024 cores
your cell phone provides.  So you can't argue that this option
wouldn't be nice to have, no?


And as I understood this discussion, there will be such option in GCC
in the nearest future.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-28 Thread Tomash Brechko
On Sun, Oct 28, 2007 at 21:03:09 +0300, Tomash Brechko wrote:
> What I was trying to say, is that it would be nice to have
> -fno-thread-unsafe-optimization option.

Rather clear -fno-speculative-store, in the light of mprotect() and
non-writable memory.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-28 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 02:39:15 -, Dave Korn wrote:
>   BTW, you and Tomash should get your stories in synch.  He says speculative
> loads are ok, just no stores, and wants a kind of half-volatile flag that
> would only suppress stores.  I think you're already looking one step further
> down the road than he is and have realised that speculative loads will give
> you problems too.

You don't do your homework.  This pointer
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2338.html
(which was already posted in this thread) explains the matter, see
"Speculative code motion involving loads" section.  So both David and
me are correct.


But curious, Bart already tried _several times_ to explain why using
volatile is not an option, but his arguments seem to be too
"inconvenient" to be considered.  Let me repeat: suppose we agree that
every shared data should be annotated as volatile.  So if I want to
share dynamic data, I have to write


   _volatile_ data_type *pdata = malloc(size);


But how to use this data?  There are not many library functions that
accept pointer to volatile (and casting the qualifier away will bring
us back to the start).  Should every library function have 2^n copies
where different combinations of parameters are annotated as volatile?


I think most pro-volatile people didn't understood the meaning of
several papers in the Internet that say you have to use volatile.
Those papers never meant to say that volatile is a proper way to use
shared data with POSIX threads, rather that because the compilers are
made the way they are you have to use volatile for now to overcome
compiler thread-unawareness.


David R. Butenhof was the member of POSIX.1c (POSIX Threads)
committee.  In his book, "Programming with POSIX Threads", there are
no volatiles at all.  Of course one can say he didn't grok C, or even
POSIX, or POSIX Threads.  But it shows the intent, at least how he
felt it.

And this is the way to go: in sane world standards follow the reality,
not the other way around.  And they will, that's why the work of Hans
Boehm is there.  As it was already mentioned in this thread, while his
proposal is not final yet, most of the work is being done on atomics,
so it highly unlikely that "no-speculative-stores-please" requirement
will change.



-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 10:43:13 +0300, Tomash Brechko wrote:
> I think most pro-volatile people didn't understood the meaning of
> several papers in the Internet that say you have to use volatile.

And some don't understand the true purposes of volatile itself.  In
the code below


  volatile int *v = (int *) 0xdeadbeef;

  void
  f()
  {
int i;
for (i = 0; i < N; ++i)
  *v = 1;
  }


_all_ N stores matter.  Why?  Because v may point to the device I/O
port, and the device may _count_ those writes among other things.

But if *v is simply shared, do all stores to it matter?  No, only the
final value is relevant.

That's why -fno-speculative-store will never be equal to volatile, and
that's why it is needed to replace current volatile hammer.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 09:12:09 +0100, Eric Botcazou wrote:
> Define "final value".

The value that will be seen by other threads after they synchronize
memory (with pthread_mutex_lock(), for instance).


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 01:08:22 -0700, Andrew Pinski wrote:
> On 10/29/07, Tomash Brechko <[EMAIL PROTECTED]> wrote:
> > But if *v is simply shared, do all stores to it matter?  No, only the
> > final value is relevant.
> 
> Actually it depends, it might matter.  If you have a loop checking the
> value of *v on a different thread and it does not change until this
> loop is done, then you end up with a wrong wait.  This is the same as
> what violatile is for really where it will change out side of the
> current thread.

Such program would be incorrect wrt POSIX Threads: you shouldn't read
the object that may be modified by another thread.  Such "wait" loop
is always wrong wrt POSIX Threads.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 11:42:10 +0300, Tomash Brechko wrote:
> It means that the current thread is free to cache the value in the
> register as long as it knows no other thread can access it (i.e. as
> long as it holds corresponding mutex).

And because your next question will be "how the compiler will know the
corresponding mutex", the answer is: it can't, that's why "opaque
function" rules come to play.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 09:31:13 +0100, Eric Botcazou wrote:
> > The value that will be seen by other threads after they synchronize
> > memory (with pthread_mutex_lock(), for instance).
> 
> What does it mean from the viewpoint of the current thread?

It means that the current thread is free to cache the value in the
register as long as it knows no other thread can access it (i.e. as
long as it holds corresponding mutex).


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 09:50:16 +0100, Eric Botcazou wrote:
> Right, so please define more or less formally what the "final value" is from 
> the viewpoint of the current thread, this is the crux of the matter.

OK, formally there's no "final" value from current thread's POV, only
the "current" value.  "Final" only matters from other thread's POV,
like "this is the last value that was produced by another thread
before it released the mutex".

But shouldn't we formally define "define" first? :)


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 11:55:25 +0300, Tomash Brechko wrote:
> OK, formally there's no "final" value from current thread's POV, only
> the "current" value.  "Final" only matters from other thread's POV,
> like "this is the last value that was produced by another thread
> before it released the mutex".

Rather, "...before it released the mutex, and we acuired the same
mutex".  But it may be the same thread actually, so "final value" is
the value that is seen by the thread at the beginning of excusive
access to the object.  It is "final" wrt previous exclusive access to
this object.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 12:04:14 +0300, Tomash Brechko wrote:
> Rather, "...before it released the mutex, and we acuired the same
> mutex".  But it may be the same thread actually, so "final value" is
> the value that is seen by the thread at the beginning of excusive
> access to the object.  It is "final" wrt previous exclusive access to
> this object.

Note that this doesn't require the value to actually _be_ in the
memory, only to be observed as if it is there.  That's the power of
POSIX Threads, and that's why memory barriers, not cache flushes, are
behind pthread_mutex_lock() and friends.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 12:54:22 +0100, Andi Kleen wrote:
> See http://gcc.gnu.org/ml/gcc/2007-10/msg00607.html for a test case
> that shows where it can go horrible wrong (optimized code significantly 
> slower than unoptimized code) Admittedly it is a constructed
> one, but I don't think it is that unrealistic.

Thanks.  I had to change %Lu to %lu, and the example shows the point
when run multiple times.



-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 15:53:56 +0100, Michael Matz wrote:
> No it won't, because without further information GCC can't know that a 
> memory access won't trap.  Ergo it will not move it out of its control 
> region, exactly because it would potentially introduce traps where there 
> were none before.

Good reasoning, and that's exactly what some of us are asking for.
Please see the example:


  #include 
  #include 
  #include 


  int
  f(int read_only, int a[])
  {
int res = a[0];

if (! read_only)
  a[0] = 1;
  
return res;
  }


  int
  main(void)
  {
const long page_size = sysconf(_SC_PAGESIZE);

int *a1 = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
int *a2 = mmap(NULL, page_size, PROT_READ,
   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

f(0, a1);
f(1, a2);

fputs("GCC is the best compiler ever!\n", stdout);
  }


It gives:

  moonlight:/tmp$ /usr/local/gcc-4.3-trunk/bin/gcc -O0 mmap.c -o mmap
  moonlight:/tmp$ ./mmap
  GCC is the best compiler ever!
  moonlight:/tmp$ /usr/local/gcc-4.3-trunk/bin/gcc -O1 mmap.c -o mmap
  moonlight:/tmp$ ./mmap
  Segmentation fault


:-/


The discussion is not pointless, just please try to understand what
other people are trying to say.  No one is stupid, we all just not on
the same page yet.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 19:20:25 +0300, Tomash Brechko wrote:
> Good reasoning, and that's exactly what some of us are asking for.
> Please see the example:

At higher optimization levels GCC may inline f(), or not call it at
all, so below is a more complete case:


#include 
#include 
#include 


int
f(int read_only, int a[]) __attribute__((__noinline__));


int
f(int read_only, int a[])
{
  int res = a[0];

  if (! read_only)
a[0] = 1;
  
  return res;
}


int
main(void)
{
  int res;

  const long page_size = sysconf(_SC_PAGESIZE);

  int *a1 = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  int *a2 = mmap(NULL, page_size, PROT_READ,
 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

  res += f(0, a1);
  res += f(1, a2);

  fputs("GCC is the best compiler ever!\n", stdout);

  return res;
}



-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 21:52:19 +0100, Michael Matz wrote:
> It is safe if there's another dominating store outside the control region.  
> Apart from that I reluctantly agree (i.e. it's not enough if there's any 
> dominating access through the pointer in question, it must be a store).

Thank you!  I almost started to think like I'm loosing grounds for my
claims :).


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 20:37:52 +0100, Duncan Sands wrote:
> I don't see this with gcc 4.1 or 4.2.  Just a data point.

Yes, thanks for pointing this.  It fails with gcc (GCC) 4.3.0 20071021
(experimental) though.  It turns out that GCC 4.2 and below don't do
this optimization for pointers (even when known to be non-null).
Formally, POSIX requires mprotect() to work only on mmap()'ed regions,
which are accessed through pointers.  Technically you can make any
page read-only, including the one that holds globals, but this won't
pass GCC lawyers.

Still, I believe the example proves the general idea.  It shows that
speculative store is never safe, because every 'if' may be an 'if not
read-only'-one.  And if optimization is not being performed, then it's
only for good: the program is thread-safe, and disabling optimization
for other cases won't affect performance of pointer case.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Mon, Oct 29, 2007 at 22:30:20 +0100, Eric Botcazou wrote:
> See gcc/gthr-posix.h for a proper use of "volatile" for a shared access.

It was already shown that you can't use volatile in general case,
because you can't pass such data to any function.  See the mail of
Bart Van Assche.

The use doesn't become proper simply because it appears in the code,
even if in the code of GCC.  volatile might be used there for
completely different reasons.  Consider this comment:

  static volatile int __gthread_active = -1;

  ...

/* This test is not protected to avoid taking a lock on the main code
   path so every update of __gthread_active in a threaded program must
   be atomic with regard to the result of the test.  */
if (__builtin_expect (__gthread_active_latest_value < 0, 0))
  {
...


volatile + atomic update + cache-coherent system will indeed give you
the correct result, but such use is not POSIX-compliant, and I mostly
talk about POSIX Threads.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
I accidentally removed the essential line, it should be:

On Tue, Oct 30, 2007 at 10:44:52 +0300, Tomash Brechko wrote:
>   static volatile int __gthread_active = -1;
> 
>   ...

  int __gthread_active_latest_value = __gthread_active;

> /* This test is not protected to avoid taking a lock on the main code
>path so every update of __gthread_active in a threaded program must
>be atomic with regard to the result of the test.  */
> if (__builtin_expect (__gthread_active_latest_value < 0, 0))
>   {
> ...


But you knew it already ;).


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-30 Thread Tomash Brechko
On Tue, Oct 30, 2007 at 10:59:24 +0300, Tomash Brechko wrote:
> On Tue, Oct 30, 2007 at 08:56:08 +0100, Eric Botcazou wrote:
> > > The use doesn't become proper simply because it appears in the code,
> > > even if in the code of GCC.  volatile might be used there for
> > > completely different reasons.
> > 
> > No, I put it there for this purpose.
> 
> Then you could remove it, if not for unlocked access.

Frankly, you realise the consequences of volatile access, you have
this comment:

  /* Avoid reading __gthread_active twice on the main code path.  */
  int __gthread_active_latest_value = __gthread_active;


Now, do you really believe that every multithreaded program should use
volatile, and then should copy shared data to temporal storage, just
because volatile is such a hammer?  You may have to, with current
compilers, but that's not what was supposed by POSIX.

-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-29 Thread Tomash Brechko
On Tue, Oct 30, 2007 at 08:56:08 +0100, Eric Botcazou wrote:
> > The use doesn't become proper simply because it appears in the code,
> > even if in the code of GCC.  volatile might be used there for
> > completely different reasons.
> 
> No, I put it there for this purpose.

Then you could remove it, if not for unlocked access.

-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-30 Thread Tomash Brechko
On Tue, Oct 30, 2007 at 09:20:28 +0100, Eric Botcazou wrote:
> No, I just wanted to point out that "volatile" has a well-defined semantics 
> and can be properly used for shared accesses.  In other words, it's not all
> or nothing like your earlier message[*] seemed to imply.
> 
> [*] http://gcc.gnu.org/ml/gcc/2007-10/msg00663.html

I didn't get your point.  Sure volatile can be used _along_ with
shared data.  But we can't say it _has_ to be used _for_ shared data.
I.e. if you require all shared data to be volatile, you can't pass
pointer to such data to any function without casting away the
qualifier.

volatile can be properly used _only_ if you also assume atomicity and
cache-coherence, and this is beyond POSIX.  But anyway, I'm proving
the opposite: when you use POSIX locks, you don't have to use
volatile, that it.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-30 Thread Tomash Brechko
I'd like to answer one last argument, mostly for the sake of curious
reader, because Michael himself has agreed with (at least the part of)
the point.


On Mon, Oct 29, 2007 at 16:00:18 +0100, Michael Matz wrote:
> The issue is, that people want to write this:
> 
>   if (condition)
> *p = value;
> 
> (i.e. without any synchronization primitive or in fact anything else after 
> the store in the control region) and expect that the store indeed only 
> happens in that control region.  And this expectation is misguided.  Had 
> they written it like:
> 
>   if (condition) {
> *p = value;
> membarrier();
>   }
> 
> it would have worked just fine.

Even if we put aside the fact that there's no such membarrier()
equivalent in POSIX bindings, this won't help.

First of all, let's note that you can't break the program by making it
_more_ ordered.  Indeed, program correctness doesn't depend on some
particular reordering (you can't predict it anyway), it depends only
on some particular ordering.  So we can rewrite

  if (condition) {
*p = value;
membarrier();
  }

as

  if (condition) {
*p = value;
membarrier();
  } else {
membarrier();
  }

But this is the same as

  if (condition)
*p = value;
  membarrier();

and we are back to the start: the store could me moved outside the
condition.  In general the following would work

  if (condition) {
*p = value;
opaque_function();
  }

because GCC has to assume that the call may access any memory, thus
store to *p can't be moved outside of the condition, because the call
itself can't be moved outside.  But such a construction can't be the
requirement for threaded programming.


In the original example there _were_ synchronization primitives
already, the complete piece is:


  if (condition)
pthread_mutex_lock(&mutex);

  ...

  if (condition)
*p = value;

  ...

  if (condition)
pthread_mutex_unlock(&mutex);


and POSIX doesn't require any additional ordering between lock() and
unlock().  When condition is false, any speculative store to *p is
bogus, because any condition is potentially a 'lock acquired'
condition (or 'not read-only' condition).  And it was shown that the
volatile qualifier can't be applied in general case.


But perhaps I'm the only one who is still unsure about the outcome of
this discussion :).


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-30 Thread Tomash Brechko
On Tue, Oct 30, 2007 at 15:33:56 +0100, Eric Botcazou wrote:
> We're not talking about locks, see the example you gave in your
> first message.

Please read the _description_ that comes along with the code example.

Anyways, the patch is there.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-30 Thread Tomash Brechko
On Tue, Oct 30, 2007 at 07:50:04 -0700, Ian Lance Taylor wrote:
> Tomash Brechko <[EMAIL PROTECTED]> writes:
> 
> > Even if we put aside the fact that there's no such membarrier()
> > equivalent in POSIX bindings, this won't help.
> 
> In POSIX, any mutex function must be a membarrier.  For example, on
> x86, mutex lock and unlock more or less have to execute the mfence
> instruction.  If they don't, the program can see inconsistent data
> structures despite the mutex operations.

Yes, but you don't imply I should write

  if (condition) {
*p = value;
pthread_mutex_lock(&dummy):
pthread_mutex_unlock(&dummy):
  }

just to trigger it.


> >   if (condition) {
> > *p = value;
> > membarrier();
> >   } else {
> > membarrier();
> >   }
> > 
> > But this is the same as
> > 
> >   if (condition)
> > *p = value;
> >   membarrier();
> 
> No, it isn't.  If membarrier is not a general function call, then it
> has to be a magic function.  In gcc it is implemented using a volatile
> asm.

I didn't get your point, but probably you didn't get my either.  I was
talking about memory barriers as a whole, not a particular
implementation in GCC.  And my point is that you are free to inject
them wherever you like.  This will affect performance, but not
correctness.  Hence you can't be sure membarrier() won't be moved from
the condition.


> Note that I've committed my patch to avoid speculative stores to all
> active branches, so this particular case should be a non-issue going
> forward.  However, we all are going to have to take a careful look at
> gcc to make sure that it generally conforms to the C++0x memory model.

I'm not against ending this discussion.  As I understand the patch
(and I don't grok GCC internals), it fixes both read-only memory case,
and race case.  But it doesn't try to preserve the optimization in the
form that was suggested by Michael Matz (i.e. to use pointer to dummy
object on the stack), right?


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-30 Thread Tomash Brechko
On Tue, Oct 30, 2007 at 09:49:00 -0700, Ian Lance Taylor wrote:
> I don't know which suggestion you are referring to.  The patch I wrote
> will retain the optimization in the case where the memory location is
> unconditionally written later in the function.  This is most relevant
> in that the optimization can take place in a loop, if somewhere after
> the loop the memory location is unconditionally written.

OK, thanks for the description, I just couldn't build GCC after update
to see what result looks like.  And big Thank You for the patch!


-- 
   Tomash Brechko