Re: -fno-tree-cselim not working?

2007-10-26 Thread Jakub Jelinek
On Thu, Oct 25, 2007 at 09:40:32PM -0700, David Miller wrote:
> > Code like needs to use volatile or explicit memory barriers.
> 
> I totally disagree with you, and I think POSIX does too.

Yeah.  See also http://gcc.gnu.org/PR31862
Unsafe optimizations in loop IM, if conversion, etc. really either need
to be conditionalized on some new switch whether code should be thread
safe or not, or just need to be disabled altogether.  Of course if
gcc can prove the variable is written later on anyway before reaching any
kind of barrier, it can do these optimizations anyway if it seems to be
worthwhile.  MEMs in current function's stack frame can be considered safe
as well.

> Threaded programming is hard enough as it is.
> 
> What's the point of the lock if a test on whether we've obtained it or
> not can't be used to conditionalize code execution?
> 
> Any write to memory is a globally visible side effect, and even
> ignoring theading issues I bet similar cases can be constructed
> involving signals that behave equally unexpectedly and should not
> require bogus memory barriers or volatile.
> 
> I'm not sure people realize the true ramifications of this suggested
> "fix".  It is rediculious, by any measure.

Jakub


Re: -fno-tree-cselim not working?

2007-10-26 Thread skaller

On Thu, 2007-10-25 at 21:50 -0700, Ian Lance Taylor wrote:
> David Miller <[EMAIL PROTECTED]> writes:
> 
> > From: Ian Lance Taylor <[EMAIL PROTECTED]>

I'm having some trouble following this discussion.

In serial code, assume we permit a register to 
transiently alias a variable.

For this to be viable, the compiler must know all
the references to the variable, to ensure the register
is used instead, or, store back the value in the variable.

Four things seem to govern this: rules on sequence points,
rules on permitted aliasing, volatile, and facts deduced by the 
compiler from analysing the context.

As I understand it volatile is typically used as a 'hint'
to the compiler that there could be aliases it cannot see.
This is independent of the use suggesting asynchronous changes
in a hardware port for example, although the effect is the same.

OK, now in POSIX with threading, we can use mutex locks to
serialise access to some variables. We have to assume that
if I lock, write, and unlock, and another thread does lock,
read, unlock, that the read/write operations are ordered:
the start of one operation must come after the completion
of the other (although which happens first isn't determined
without further investigation of the actual code).

So the rule must be that any write protected by a mutex
like this must be put somewhere that a protected read
will fetch that write, IF the read happens afterwards,
whether or not the variable is volatile.

I have to deduce from this that unlocking a mutex
must dump any values in registers aliasing store into
that store, as well as ensuring the cache organises to
ensure a read from the other thread after it locks the
mutex, will fetch that value -- "as if" the write went
through to memory and the read was from memory
(even if the actual value never gets into memory).

So: accesses to any variable read or written
whilst protected by a lock are ordered with respect
to any other accesses protected by a lock in any
other thread, whether or not the variable is aliased
or volatile.

I'm deducing this from existing practice NOT any standards:
in this circumstance, volatile is NOT needed.

That leaves open several questions. One is: what can
we say about a write in thread A *prior* to a lock
being set and released, and a read in thread B
*after* a lock is set and released, where the read
is certain to be following the write temporaly,
for a variable NOT accessed during the exclusion period
by either thread?

I believe this also must be guaranteed, volatile or not,
standards or not, by considering the trivial example
of storing a pointer in thread A, then reading it in
thread B, where the variable is mutexed. We have
to assume the store the pointer POINTS AT is also
subject to the ordering constraint that applied to
the pointer, or most of the utility of
mutexed synchronisation is lost. For example when mutexing
a queue where threads enqueue and dequeue data to
ensure you properly retrieve the pointer in the queue
is useless if the data pointed at isn't guaranteed.

SO I would have to conclude something like: if a mutex
synchronises variables ROOT, then all variables REACHABLE
from ROOT must also be synchronised.

And in effect that means ALL variables must be synchronised,
unless the compiler is exceedingly smart!

AFAICS this means that the compiler must recognize 
mutex.unlock and dump registers into the variables they
alias at this point. The OS/library function then takes care
of cache synchronisation etc.

I'm not sure I understand so I'd welcome any feedback on this:
I don't see any role at all for volatile here. What I mean
is that we CANNOT require volatile on all that data, it would
be a pain and equivalent to making everything volatile,
defeating optimisations. Rather, the 'volatility' is temporal
and transient and associated with a mutex operation.

[BTW I think this sucks, the need to synchronise ALL memory
on mutexing is far too heavy]

-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net


Re: -fno-tree-cselim not working?

2007-10-26 Thread Jakub Jelinek
On Fri, Oct 26, 2007 at 12:27:01AM -0700, David Miller wrote:
> From: Jakub Jelinek <[EMAIL PROTECTED]>
> Date: Fri, 26 Oct 2007 09:09:03 +0200
> 
> > MEMs in current function's stack frame can be considered safe as
> > well.
> 
> Unless they are passed by reference down into functions or
> a reference to them is assigned into an externally visible
> structure.

If they are passed by reference then the MEMs have some pointer
as address, rather than argp/frame/sp + offset.

Jakub


Re: -fno-tree-cselim not working?

2007-10-26 Thread David Miller
From: Jakub Jelinek <[EMAIL PROTECTED]>
Date: Fri, 26 Oct 2007 09:48:25 +0200

> On Fri, Oct 26, 2007 at 12:27:01AM -0700, David Miller wrote:
> > From: Jakub Jelinek <[EMAIL PROTECTED]>
> > Date: Fri, 26 Oct 2007 09:09:03 +0200
> > 
> > > MEMs in current function's stack frame can be considered safe as
> > > well.
> > 
> > Unless they are passed by reference down into functions or
> > a reference to them is assigned into an externally visible
> > structure.
> 
> If they are passed by reference then the MEMs have some pointer
> as address, rather than argp/frame/sp + offset.

Even in cases like:

typedef unsigned char spinlock_t;
extern int spin_trylock(spinlock_t *);
extern int spin_unlock(spinlock_t *);

struct foo {
   spinlock_t lock;
   int a;
};

extern void foo_register(struct foo *p);
extern void foo_unregister(struct foo *p);

void example(void)
{
struct foo local;

foo_register(&local);

if (spin_trylock(&local.lock)) {
local.a++;
spin_unlock(&local.lock);
}

foo_unregister(&local);
}


I suspect that local.a++ will be represented as a frame relative
access.  Test compiles show that gcc does update the on-stack
copy on Sparc, now just to check if the conditional memory
operation cases trigger here too.



Re: -fno-tree-cselim not working?

2007-10-26 Thread David Miller
From: Jakub Jelinek <[EMAIL PROTECTED]>
Date: Fri, 26 Oct 2007 09:09:03 +0200

> MEMs in current function's stack frame can be considered safe as
> well.

Unless they are passed by reference down into functions or
a reference to them is assigned into an externally visible
structure.

We actually have cases in the Linux kernel where this happens
as wait queues are on the stack and they are locked objects.


Re: Removal of pre-ISO C++ items from include/backwards

2007-10-26 Thread Richard Guenther
On Thu, 25 Oct 2007, Joe Buck wrote:

> Has anyone checked yet on the impact on a Debian distribution of
> these proposed changes (and even for things that are checked in,
> they should only be thought of as "proposed" at this point)?

I re-built openSUSE with both changes and the ext/ stuff causes 62
build failures, while the .h header API removal causes 21 build failures.

Richard.


Re: Removal of pre-ISO C++ items from include/backwards

2007-10-26 Thread Gabriel Dos Reis
On Fri, 26 Oct 2007, skaller wrote:

| 
| On Thu, 2007-10-25 at 20:34 -0500, Gabriel Dos Reis wrote:
| > On Fri, 26 Oct 2007, skaller wrote:
| > 
| > | I should point out retaining 'old' features can create a
| > | significant maintenance burden for gcc developers,
| > 
| > In this specific case, what are they?
| 
| You're in a better position than me to determine that.
| I don't know: it's a generalisation from experience
| with half a dozen compiler development projects I track.

Yes, that is why I asked `in this specific case'.

I have no problem with letting the `old' headers as they are without
adding new stuff to them -- it would break less old or existing codes
not to add to those headers than removing them.  
So, except the mechanical annual copyright update, there is no much
those headers require us to do on regular basis.  We don't need to
update them with newer allocation strategy, thready safety, default
allocators, etc.  They fact that they are not exactly like the tr1
hasj containers is not an issue -- that is precisely they are there
for compatibility.

-- Gaby


Re: -fno-tree-cselim not working?

2007-10-26 Thread Andreas Schwab
Ian Lance Taylor <[EMAIL PROTECTED]> writes:

> The above code happens to use pthread_mutex_trylock, but there is no
> need for that.

pthread_mutex_trylock is special, because POSIX says it is a memory
synchronisation point (see section 4.10 Memory Synchronization).

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: problem with iv folding

2007-10-26 Thread Zdenek Dvorak
Hi,

> traceback, tt, and ops follow.  Why is this going wrong?

> [ gdb ] call debug_tree(arg0)
>   type  
> [ gdb ] call debug_tree(arg1)
>   type 

Autovectorized HIRLAM - latest results.

2007-10-26 Thread Toon Moene
In August, I wrote the following about vectorized / not vectorized loops 
in HIRLAM

(see http://hirlam.org):


$ grep 'LOOP VECTORIZED' HL_Prepare_00.html  | wc -l
3273
$ grep 'not vectorized' HL_Prepare_00.html  | wc -l
7845


Yesterday, I performed a test with gcc-trunk-129472 (which is a week 
old, by now):


$ grep 'LOOP VECTORIZED' HL_Prepare_00.html | wc -l
5316
$ grep 'not vectorized' HL_Prepare_00.html | wc -l
6060

This was simply using -O3 -ffast-math on the code.  Apparently a lot of 
progress has been made (and perhaps the combination of loop 
optimizations enabled by -O3 over -O2 play a role, too):


$ gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
$ gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
$ diff /tmp/O2-opts /tmp/O3-opts | grep enabled
>   -fgcse-after-reload [enabled]
>   -finline-functions  [enabled]
>   -fpredictive-commoning  [enabled]
>   -ftree-vectorize[enabled]
>   -funswitch-loops[enabled]

The timing difference is as follows:

-O3 -ffast-math:

$ grep 'FORECAST TOOK' HL_Cycle*
HL_Cycle_2006120100.html: FORECAST TOOK 6.2284 SECONDS
HL_Cycle_2006120100.html: FORECAST TOOK  2430.0159 SECONDS
HL_Cycle_2006120106.html: FORECAST TOOK   258.1721 SECONDS
HL_Cycle_2006120106.html: FORECAST TOOK 6.1164 SECONDS
HL_Cycle_2006120106.html: FORECAST TOOK   304.9590 SECONDS
HL_Cycle_2006120112.html: FORECAST TOOK   259.7802 SECONDS
HL_Cycle_2006120112.html: FORECAST TOOK 6.1524 SECONDS
HL_Cycle_2006120112.html: FORECAST TOOK  2303.5320 SECONDS
HL_Cycle_2006120112r.html: FORECAST TOOK   417.3861 SECONDS
HL_Cycle_2006120118.html: FORECAST TOOK   259.9763 SECONDS
HL_Cycle_2006120118.html: FORECAST TOOK 6.0764 SECONDS
HL_Cycle_2006120118.html: FORECAST TOOK   306.5071 SECONDS
HL_Cycle_2006120200.html: FORECAST TOOK   259.9482 SECONDS
HL_Cycle_2006120200.html: FORECAST TOOK 6.1564 SECONDS
HL_Cycle_2006120200.html: FORECAST TOOK  2300.3560 SECONDS
HL_Cycle_2006120200r.html: FORECAST TOOK   414.8299 SECONDS

-O2 -ffast-math:

$ grep 'FORECAST TOOK' HL_Cycle*
HL_Cycle_2006120100.html: FORECAST TOOK 6.3244 SECONDS
HL_Cycle_2006120100.html: FORECAST TOOK  2510.3809 SECONDS
HL_Cycle_2006120106.html: FORECAST TOOK   268.3368 SECONDS
HL_Cycle_2006120106.html: FORECAST TOOK 6.2484 SECONDS
HL_Cycle_2006120106.html: FORECAST TOOK   316.4918 SECONDS
HL_Cycle_2006120112.html: FORECAST TOOK   268.1648 SECONDS
HL_Cycle_2006120112.html: FORECAST TOOK 6.2724 SECONDS
HL_Cycle_2006120112.html: FORECAST TOOK  2377.2166 SECONDS
HL_Cycle_2006120112r.html: FORECAST TOOK   432.7510 SECONDS
HL_Cycle_2006120118.html: FORECAST TOOK   270.2049 SECONDS
HL_Cycle_2006120118.html: FORECAST TOOK 6.2244 SECONDS
HL_Cycle_2006120118.html: FORECAST TOOK   316.9878 SECONDS
HL_Cycle_2006120200.html: FORECAST TOOK   268.7688 SECONDS
HL_Cycle_2006120200.html: FORECAST TOOK 6.2924 SECONDS
HL_Cycle_2006120200.html: FORECAST TOOK  2371.8962 SECONDS
HL_Cycle_2006120200r.html: FORECAST TOOK   432.6790 SECONDS

Roughly 3.3 %.

Kind regards,

--
Toon Moene - e-mail: [EMAIL PROTECTED] - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.indiv.nluug.nl/~toon/
GNU Fortran's path to Fortran 2003: http://gcc.gnu.org/wiki/Fortran2003


Re: alias and pointers analysis

2007-10-26 Thread Diego Novillo
On 10/26/07, J.C. Pizarro <[EMAIL PROTECTED]> wrote:

> What is the matter if the 'b' var. is unused and
> optimally removed by the SSA algorithm?

In this case, it will not be removed.  If any of the p_i pointers is
ever dereferenced in this code, that will be considered a use of
variable 'b'.


> int a;
> int b;
>
> a = 2;
> p_4 = phi(a)

Is this 'phi' as in a PHI function or a function in your code?  If the
former, then it's wrong, you can never have such a phi function in
this code snippet.

> // b doesn't used here
> if (...)
>   p_1 = &a;
> else
>   p_2 = &b;
> endif
> p_3 = phi (p_1, p_2)
>
> points-to (p_1) = { a }
> points-to (p_2) = { b }
> points-to (p_3) = { a b }
>
> In this case, should exist hidden p_5 = phi(b) although 'b' is not used
> but yes used his reference to phantom cell 'b'. It's weird for me.

I recommend that you read about the SSA form.  PHI nodes are special
constructs that exist only where multiple control flow paths reach a
common join node.  The getting started section of the wiki has links
to books and articles about it.  Morgan's book on compiler
optimization is fairly good.

> I've not idea WHERE put "hidden p_5 = phi(b)"!

No such thing exists.


> Too it's possible to ocurr *p_2 = c where 'b' will be hidden used through
> the pointer p_2. It's too weird for me.

Yes, that is possible, an that is precisely what alias analysis tells
the compiler.  We know from the analysis that reading/writing to
'*p_2' is exactly the same as reading/writing to 'b'.


Re: GCC 4.3 release schedule

2007-10-26 Thread Richard Guenther
On 10/26/07, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
>
> Now that GCC is in stage 4.3, I think we'd all be in agreement that it
> would be nice to keep this stage short and get a release out.
>
> We are interested in using 4.3 as the system compiler in Fedora 9, and
> as such, we'd like to nail down some time lines and requirements with
> release management and the steering committee.

Of course it doesn't work like this ;)

> The timelines involved are something like this:  (clearly anything
> earlier would be great :-)
>
> Nov 14th - we'd like to start building F9 with a 4.3 compiler. Ideally
> we'd have a branch cut no later than that and starting to stabilize
> without much new code going in.

I oppose to cut a branch without immediately releasing 4.3.0.  This just
doesn't work - just look at the history.  I also think that Nov 14th is _very_
optimistic (I personally was thinking about a (early) Q1 release).

Note that we are building what-becomes-openSUSE 11.0 with current trunk
in parallel to 4.2 at the moment, switching to 4.3 is in the next weeks
(unless I get pushed back again ;)).

> Dec 14th - viability decision on using 4.3 as the F9 compiler.  There is
> a window here as late as Jan 14th, but the opinions will start forming
> in Dec. There shouldn't be any ABI changes from this point on.
> Feb 29th - Optimistic target date for a 4.3 release.  can slip as much
> as a  month,  but clearly the earlier the release happens the better.

If we branch too early I bet we will not make even the Feb 29th date.

How does targeting a release around Feb 1st sound (that is, without
branching before, a RC1 on Feb 1st, the release including branching
a week or two later)?  It looks like that would work for your constraints.

> I don't recall seeing any other timeline requirements, does this seem
> like a reasonable target schedule? Once a decision is made to use 4.3 by
> mid-jan, it becomes very difficult to back out if something happens to
> the release date, so it becomes quite important that the final criteria
> is well understood by then and appears reachable.  If something
> unforeseen happens late in the cycle to delay the release, its also
> important that we can get some sort of steering committee agreement on
> what to do so we don't have some sort of evil gcc offspring as happened
> once before.

Once again - GCC development and release planning doesn't work this way.
But instead you (and me and others) are supposed to make the release "happen"
by fixing bugs and fulfilling the release criteria.  A "date" wasn't a
release criteria
in the past and I don't think it should become one (in fact, the only
time a release
"date" would be a welcome criteria is if the release is a throw-away
and releasing
would allow us to work on next stage1 again).

> Thats something I don't expect to happen, but will have to
> visit as risk management before the final decision is made.   My hope is
> that it'll be in good enough shape by mid january that slippage by that
> much is unlikely and isn't an issue.
>
> Does this seem like a reasonable schedule?  Can we set the criteria for
> what a final release would look like?  We're committed to applying
> engineering resource to help make it happen.

Again, if you commit enough engineering resource to make it happen, then I'm
sure we'll make your timeline.  If not, well - shit happens?.  At
least I wouldn't
like to see us to be locked in into some agreement around release dates.  After
all this doesn't work for glibc (ok, _that_ probably works for RedHat)
or the kernel
either.

Just my $1000M dollars.

Again - please don't branch without releasing.
(repeat 1000 times).

Thanks,
Richard.


GCC 4.3 release schedule

2007-10-26 Thread Andrew MacLeod


Now that GCC is in stage 4.3, I think we'd all be in agreement that it 
would be nice to keep this stage short and get a release out.


We are interested in using 4.3 as the system compiler in Fedora 9, and 
as such, we'd like to nail down some time lines and requirements with 
release management and the steering committee.


The timelines involved are something like this:  (clearly anything 
earlier would be great :-)


Nov 14th - we'd like to start building F9 with a 4.3 compiler. Ideally 
we'd have a branch cut no later than that and starting to stabilize 
without much new code going in.
Dec 14th - viability decision on using 4.3 as the F9 compiler.  There is 
a window here as late as Jan 14th, but the opinions will start forming 
in Dec. There shouldn't be any ABI changes from this point on.
Feb 29th - Optimistic target date for a 4.3 release.  can slip as much 
as a  month,  but clearly the earlier the release happens the better.


I don't recall seeing any other timeline requirements, does this seem 
like a reasonable target schedule? Once a decision is made to use 4.3 by 
mid-jan, it becomes very difficult to back out if something happens to 
the release date, so it becomes quite important that the final criteria 
is well understood by then and appears reachable.  If something 
unforeseen happens late in the cycle to delay the release, its also 
important that we can get some sort of steering committee agreement on 
what to do so we don't have some sort of evil gcc offspring as happened 
once before. Thats something I don't expect to happen, but will have to 
visit as risk management before the final decision is made.   My hope is 
that it'll be in good enough shape by mid january that slippage by that 
much is unlikely and isn't an issue.


Does this seem like a reasonable schedule?  Can we set the criteria for 
what a final release would look like?  We're committed to applying 
engineering resource to help make it happen.


Andrew






Re: -fno-tree-cselim not working?

2007-10-26 Thread Samuel Tardieu
> "David" == David Miller <[EMAIL PROTECTED]> writes:

David> I suspect that local.a++ will be represented as a frame relative
David> access.  Test compiles show that gcc does update the on-stack
David> copy on Sparc,

It does on i386 too.

David> now just to check if the conditional memory operation cases
David> trigger here too.

It doesn't because you have the spin_unlock() in the "then" part so it
is no longer a single assignment (the only case that gets "optimized"
currently).

  Sam
-- 
Samuel Tardieu -- [EMAIL PROTECTED] -- http://www.rfc1149.net/



Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Bart Van Assche
On 10/21/07, Tomash Brechko <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I have a question regarding the thread-safeness of a particular GCC
> optimization.  I'm sorry if this was already discussed on the list, if
> so please provide me with the reference to the previous discussion.
>
> Consider this piece of code:
>
> extern int v;
>
> void
> f(int set_v)
> {
>   if (set_v)
> v = 1;
> }
>
> If f() is called concurrently from several threads, then call to f(1)
> should be protected by the mutex.  But do we have to acquire the mutex
> for f(0) calls?  I'd say no, why, there's no access to global v in
> that case.  But GCC 3.3.4--4.3.0 on i686 with -01 generates the
> following:
>
> f:
> pushl   %ebp
> movl%esp, %ebp
> cmpl$0, 8(%ebp)
> movl$1, %eax
> cmove   v, %eax; load (maybe)
> movl%eax, v; store (always)
> popl%ebp
> ret
>
> Note the last unconditional store to v.  Now, if some thread would
> modify v between our load and store (acquiring the mutex first), then
> we will overwrite the new value with the old one (and would do that in
> a thread-unsafe manner, not acquiring the mutex).
>
> So, do the calls to f(0) require the mutex, or it's a GCC bug?
...
> So, could someone explain me why this GCC optimization is valid, and,
> if so, where lies the boundary below which I may safely assume GCC
> won't try to store to objects that aren't stored to explicitly during
> particular execution path?  Or maybe the named bug report is valid
> after all?

Hello Tomash,

I'm not an expert in the C89/C99 standards, but I have written a Ph.D.
on the subject of memory models. What I learned during writing that
Ph.D. is the following:

- If you want to know which optimizations are valid and which ones are
not, you have to look at the semantics defined in the language
standard.

- Every language standard document defines what the result is of
executing a sequential program. The definition of the behavior of a
multithreaded program written in a certain programming language is
called the memory model of that programming language.

- The memory model of C and C++ is still under discussion as has
already been pointed out on this mailing list.

- Although the memory model for C and C++ is still under discussion,
there is a definition for the behavior of multithreaded C and C++
programs. The following is required by the ANSI/ISO C89 standard (from
paragraph 5.1.2.3, Program Execution):
  Accessing a volatile object, modifying an object, modifying a file,
or calling a function
  that does any of those operations are all side effects, which are
changes in the state of
  the execution environment. Evaluation of an expression may produce
side effects. At
  certain specified points in the execution sequence called sequence
points, all side effects
  of previous evaluations shall be complete and no side effects of
subsequent evaluations
  shall have taken place. (A summary of the sequence points is given
in annex C.)

In annex C it is explained that a.o. the call to a function (after
argument evaluation) is a sequence point.

See also http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf

- The above paragraph does not impose any limitation for the compiler
with regard to optimizations on non-volatile variables. Or: the
generated code shown in your mail is allowed by the above paragraph.

- The above paragraph has also the following implications for volatile
variables:
  * There exists a total order for all accesses to all volatile variables.
  * It is the responsibility of the compiler to ensure cache coherency
for volatile variables. If memory barrier instructions are needed to
ensure cache coherency on the architecture for which the compiler is
generating code for, then it is the responsibility of the compiler to
generate these instructions for volatile variables. This fact is often
overlooked.
  * The compiler must generate code such that exactly one store
statement is executed for each assignment to a volatile variable.
Prefetching volatile variables is allowed as long as it does not
violate paragraph 5.1.2.3 from the language definition.
  * As known the compiler may reorder function calls and assignments
to non-volatile variables if the compiler can prove that the called
function won't modify that variable. This becomes problematic if the
variable is modified by more than one thread and the called function
is a synchronization function, e.g. pthread_mutex_lock(). This kind of
reordering is highly undesirable. This is why any variable that is
shared over threads has to be declared volatile, even when using
explicit locking calls.

I hope the above brings more clarity in this discussion.

Bart Van Assche.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Bart Van Assche
On 10/22/07, Andrew Haley  wrote:

> The core problem here seems to be that the "C with threads" memory
> model isn't sufficiently well-defined to make a determination
> possible.  You're assuming that you have no responsibility to mark
> shared memory protected by a mutex as volatile, but I know of nothing
> in the C standard that makes such a guarantee.  A prudent programmer
> will make conservative assumptions.

I agree that according to the C and C++ language standards, any
variable shared over threads should be declared volatile. But it is
nearly impossible to live with this requirement: this requirement
implies that for each library function that modifies data through
pointers a second version should be added that accepts a volatile
pointer instead of a regular pointer. Consider e.g. the function
snprintf(), which writes to the character buffer passed as its first
argument. When snprintf() is used to write to a buffer that is not
shared over threads, the existing snprintf() function is fine. When
however snprintf() is used to write to a buffer that is shared by two
or more threads, a version is needed of snprintf() that accepts
volatile char* as its first argument.

My opinion is that it should be possible to declare whether C/C++ code
has acquire, release or acquire+release semantics. The fact that code
has acquire semantics means that no subsequent load or store
operations may be moved in front of that code, and the fact that code
has release semantics means that no preceding load or store operations
may be moved past that code. Adding definitions for acquire and
release semantics in pthread.h would help a lot. E.g.
pthread_mutex_lock() should be declared to have acquire semantics, and
pthread_mutex_unlock() should be declared to have release semantics.
Maybe it is a good idea to add the following function attributes in
gcc: __attribute__((acquire)) and __attribute__((release)) ? A
refinement of these attributes would be to allow to specify not only
the acquire/release attributes, but also the memory locations to which
the acquire and release apply (pthreads synchronization functions
always apply to all memory locations).

I'm not inventing anything new here -- as far as I know the concepts
of acquire and release were first defined by Gharachorloo e.a. in 1990
(Memory consistency and event ordering in scalable shared-memory
multiprocessors, International Symposium on Computer Architecture,
1990, http://portal.acm.org/citation.cfm?id=325102&dl=ACM&coll=GUIDE).

Bart Van Assche.


Re: -fno-tree-cselim not working?

2007-10-26 Thread Ian Lance Taylor
David Miller <[EMAIL PROTECTED]> writes:

> This is not a game or some fun theoretical discussion about language
> semantics.  People will be harmed and lose lots of their own personal
> time debugging these kinds of things if GCC generates code like this.
> It's unreasonable, regardless of what the standards say.  Sometimes
> the standards are wrong or fail to guide the implementation in these
> grey areas, and GCC should do what's best for the users in these
> cases.  And I believe that this means to not do conditional
> computations on memory even though it might be more efficient in some
> situations.

I have no objection to implementing the sort of change you want as an
option, if it can clearly specified.

I'm going to repeat that this optimization is not new.  As far as I
know, nobody has complained about it for the several years that it has
been implemented.  That does not mean that I think we should keep it.
It just means that I think you are reaching for the hyperbole button
without justification.

The gcc compiler is under considerable pressure from many people to
deliver better performance.  The kernel programmers have different
requirements than most programmers do.  Since gcc tries to be the
compiler for everybody, it has to weigh significantly different
concerns from many people.  You are describing this situation as
though there is only one side to the argument.  I do not believe that
is the case.

Ian


Re: -fno-tree-cselim not working?

2007-10-26 Thread Ian Lance Taylor
David Miller <[EMAIL PROTECTED]> writes:

> Even in cases like:
> 
> typedef unsigned char spinlock_t;
> extern int spin_trylock(spinlock_t *);
> extern int spin_unlock(spinlock_t *);
> 
> struct foo {
>spinlock_t lock;
>int a;
> };
> 
> extern void foo_register(struct foo *p);
> extern void foo_unregister(struct foo *p);
> 
> void example(void)
> {
>   struct foo local;
> 
>   foo_register(&local);
> 
>   if (spin_trylock(&local.lock)) {
>   local.a++;
>   spin_unlock(&local.lock);
>   }
> 
>   foo_unregister(&local);
> }
> 
> 
> I suspect that local.a++ will be represented as a frame relative
> access.  Test compiles show that gcc does update the on-stack
> copy on Sparc, now just to check if the conditional memory
> operation cases trigger here too.

This code isn't going to be a problem, because spin_unlock presumably
includes a memory barrier.

The cases which are problems are the ones in which there is no memory
barrier, as in the original example.

Ian


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
Hello Bart,

Thanks for the summary.  There are good pointers in this e-mail thread
regarding the current state of the process of defining memory model
for C++ (and eventually for C I guess).

>From those pointers several conclusions may be made (which are in line
with that you said):

  - though neither Standard C nor POSIX require to use volatile, it
seems like you have to use it until the memory model is clearly
defined.

  - the compiler should not introduce speculative stores to the shared
objects.  This is what my original question was about.  I haven't
read all the papers yet, so one thing is still unclear to me: it
seems like atomic variables will be annotated as such
(atomic).  But I found no proposal for annotation of
non-atomic objects that are protected by the ordinary locks (like
mutexes).  Will the compiler be forbiden to do all speculative
stores, or how will it recognize shared objects as such?

  - the compiler should not cross object boundary when doing the store
(i.e. when storing to 8-bit char it should not store to the whole
32/64-bit word).  Here's the same question about shared object
annotation.


Cheers,

-- 
   Tomash Brechko


Re: GCC 4.3 release schedule

2007-10-26 Thread Andrew MacLeod

Richard Guenther wrote:

On 10/26/07, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
  

Now that GCC is in stage 4.3, I think we'd all be in agreement that it
would be nice to keep this stage short and get a release out.

We are interested in using 4.3 as the system compiler in Fedora 9, and
as such, we'd like to nail down some time lines and requirements with
release management and the steering committee.



Of course it doesn't work like this ;)

  
we can at least make projected dates known so we have something firmer 
than "at some point in the future" :-)

The timelines involved are something like this:  (clearly anything
earlier would be great :-)

Nov 14th - we'd like to start building F9 with a 4.3 compiler. Ideally
we'd have a branch cut no later than that and starting to stabilize
without much new code going in.



I oppose to cut a branch without immediately releasing 4.3.0.  This just
doesn't work - just look at the history.  I also think that Nov 14th is _very_
optimistic (I personally was thinking about a (early) Q1 release).

  


I got the impression from marks latest note that he was planing to cut a 
4.3 branch when we got down to 100 P1s.



We're still in Stage 3 for GCC 4.3.  As before, I think a reasonable
target for creating the release branch is less than 100 open
regressions. 



This was where I was going with the nov 14th date as a latest target.  
So you are saying we shouldn't branch at this point? 

I'd like to see some stabilization... Ie, not hunks of new functionality. 

Early Q1 for 4.3.0 works for us for a release. I had no idea any one 
else cared when 4.3 goes out.




Note that we are building what-becomes-openSUSE 11.0 with current trunk
in parallel to 4.2 at the moment, switching to 4.3 is in the next weeks
(unless I get pushed back again ;)).

  
It would be excellent to have both openSUSE and fedora pounding on 4.3 
at the same time.

Dec 14th - viability decision on using 4.3 as the F9 compiler.  There is
a window here as late as Jan 14th, but the opinions will start forming
in Dec. There shouldn't be any ABI changes from this point on.
Feb 29th - Optimistic target date for a 4.3 release.  can slip as much
as a  month,  but clearly the earlier the release happens the better.



If we branch too early I bet we will not make even the Feb 29th date.

How does targeting a release around Feb 1st sound (that is, without
branching before, a RC1 on Feb 1st, the release including branching
a week or two later)?  It looks like that would work for your constraints.

  
having a RC by the beginning of february seems reasonable to me  :-) mid 
january would be even better.

I don't recall seeing any other timeline requirements, does this seem
like a reasonable target schedule? Once a decision is made to use 4.3 by
mid-jan, it becomes very difficult to back out if something happens to
the release date, so it becomes quite important that the final criteria
is well understood by then and appears reachable.  If something
unforeseen happens late in the cycle to delay the release, its also
important that we can get some sort of steering committee agreement on
what to do so we don't have some sort of evil gcc offspring as happened
once before.



Once again - GCC development and release planning doesn't work this way.
But instead you (and me and others) are supposed to make the release "happen"
by fixing bugs and fulfilling the release criteria.  A "date" wasn't a
release criteria
in the past and I don't think it should become one (in fact, the only
time a release
"date" would be a welcome criteria is if the release is a throw-away
and releasing
would allow us to work on next stage1 again).

  
I understand, but without some sort of date guideline, you cant plan on 
using as yet unreleased compiler. It buggers up the release cycle. I 
don't propose the date as a hard release criteria, but as a target that 
we'd like to work towards. So what I'd like to see is what the final 
release requirements are, and then along they way we can monitor the 
current status and if we are falling behind,  can try to apply more 
resource to it to get back on track. 

Thats something I don't expect to happen, but will have to
visit as risk management before the final decision is made.   My hope is
that it'll be in good enough shape by mid january that slippage by that
much is unlikely and isn't an issue.

Does this seem like a reasonable schedule?  Can we set the criteria for
what a final release would look like?  We're committed to applying
engineering resource to help make it happen.



Again, if you commit enough engineering resource to make it happen, then I'm
sure we'll make your timeline.  If not, well - shit happens?.  At
least I wouldn't
like to see us to be locked in into some agreement around release dates.  After
all this doesn't work for glibc (ok, _that_ probably works for RedHat)
or the kernel
either.

  
Right. so I'm not looking to lock into Feb 29th as a rock hard rele

Re: -fno-tree-cselim not working?

2007-10-26 Thread Ian Lance Taylor
Jakub Jelinek <[EMAIL PROTECTED]> writes:

> Unsafe optimizations in loop IM, if conversion, etc. really either need
> to be conditionalized on some new switch whether code should be thread
> safe or not, or just need to be disabled altogether.  Of course if
> gcc can prove the variable is written later on anyway before reaching any
> kind of barrier, it can do these optimizations anyway if it seems to be
> worthwhile.  MEMs in current function's stack frame can be considered safe
> as well.

If you can write down the rules, then I expect that people can figure
out how to implement them in the IR.  The rules could be something
like "never move a memory write out of a conditionally executed basic
block."  It would probably be helpful to look at the draft C++0x
memory model here--although that memory model would permit this
optimization, since the variable in question was not annotated.

The rules are going to have to be implemented as additional
dependencies.  If these dependencies are not in the IR, then the
various optimizers are going to go ahead and break them.

The reason explicit memory barriers, and explicit use of volatile,
avoid this problem is that they introduce additional dependencies.

I'm going to repeat that the optimization which started this all--the
use of a conditional add and an unconditional store--is not new.  This
is not some disaster that the compiler programmers have suddenly
sprung on you.  It's been there for years.

I'm also going to repeat that most people (other than kernel
programmers) are not going to want the new dependencies.  But that's
OK--if we can figure out how to implement it, we can make it an
option.

Ian


Re: -fno-tree-cselim not working?

2007-10-26 Thread skaller

On Fri, 2007-10-26 at 07:58 -0700, Ian Lance Taylor wrote:
> skaller <[EMAIL PROTECTED]> writes:
> 
> > As I understand it volatile is typically used as a 'hint'
> > to the compiler that there could be aliases it cannot see.
> > This is independent of the use suggesting asynchronous changes
> > in a hardware port for example, although the effect is the same.
> 
> Not really.  Volatile has a reasonably precise definition: it means
> that memory accesses must be done precisely as they are written in the
> program.  They may not be coalesced, and they may not be moved.

I understand that's the common meaning .. but the semantics
aren't specified for ISO C/C++.


> > And in effect that means ALL variables must be synchronised,
> > unless the compiler is exceedingly smart!
> 
> No, it means that the implementation of mutex must be magic.

Same thing isn't it?

> 
> > [BTW I think this sucks, the need to synchronise ALL memory
> > on mutexing is far too heavy]
> 
> It can not be avoided, for the reasons you describe.

Of course it can be avoided easily if the memory model
allowed for local synchronisation sets, so the real problem
appears to be that Posix doesn't provide proper synchronisation
control.

I guess that's why 'volatile' appears attractive: one *might*
say that accesses to volatiles are atomic and ordered, without
implying anything about other memory (I'm not suggesting that,
just commenting that it would provide a way to reduce synchronisation
burden). I.e. a volatile variable acts as a kind of 'port' between
threads which act like 'processes'.

In fact, volatiles mapped onto hardware registers fronting async
devices really ARE doing just that.

-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Robert Dewar

Andrew Haley wrote:


Hmmm.  This is an interesting idea, but it sounds to me as though it's
somewhat at variance with what is proposed by the C++ threads working
group.  In any case, gcc will certainly implement whatever the
standards committees come up with, but that is probably two years
away.


One problem at the language standards level is that you can't easily
talk about loads and stores, since you are defining an as-if semantic
model, and if you make a statement about loads and stores, any other
sequence which behaves as if that sequence were obeyed is allowed. In
the absence of a notion of threads at the semantic level it's difficult
to say what you mean in a formal way. In the Ada standard, we get
around this problem by having sections called "implementation advice",
which in practice are treated as requirements, but we can use language
that is not formally sound, even though everyone knows what we mean.
Of course in Ada there is a clear notion of threads semantic, and
a clear definition of what the meaning of code is in the presence
of threads, so the specific situation discussed here is easy to
deal with (though Ada takes the view that unsychronized shared access to
non-atomic or non-volatile data from separate threads has undefined
effects).



RE: -fno-tree-cselim not working?

2007-10-26 Thread Dave Korn
On 26 October 2007 15:59, Ian Lance Taylor wrote:

> Andreas Schwab <[EMAIL PROTECTED]> writes:
> 
>> Ian Lance Taylor <[EMAIL PROTECTED]> writes:
>> 
>>> The above code happens to use pthread_mutex_trylock, but there is no
>>> need for that.
>> 
>> pthread_mutex_trylock is special, because POSIX says it is a memory
>> synchronisation point (see section 4.10 Memory Synchronization).
> 
> Sure.  But the argument that gcc is doing something wrong stands up
> just fine even we just test a global variable.  The argument that gcc
> is doing something wrong does not rely on the fact that the function
> called is pthread_mutex_trylock.

  Indeed; as I understand it, what the argument that gcc is doing something
wrong relies on is the incorrect assumption that *all* variables ought to
behave like volatile ones, i.e. have an exact one-to-one relationship between
loads and stores as written in the HLL and actual machine load/store
operations.  I could describe that another way: what the argument that gcc is
doing something wrong relies on is the misapprehension that C is still just a
glorified assembler language - which it hasn't been since the '80s, as far as
I know.

cheers,
  DaveK
-- 
Can't think of a witty .sigline today



RE: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Dave Korn
On 26 October 2007 16:15, Robert Dewar wrote:

> One problem at the language standards level is that you can't easily
> talk about loads and stores, since you are defining an as-if semantic
> model, and if you make a statement about loads and stores, any other
> sequence which behaves as if that sequence were obeyed is allowed. 

  Well, that's precisely the problem - specifically in the context of
memory-mapped I/O registers - that volatile was invented to solve.  It may
never have been clearly defined in the formal language of the specs, but I
thought it was pretty clear in intent: the compiler will emit exactly one
machine load/store operation for any rvalue reference/lvalue assignment
(respectively) in the source, at the exact sequence point in the generated
code corresponding to the location of the reference in the source.  Any other
variable may be accessed more or fewer times than is written, and may be
accessed at places other than exactly where the reference is written in the
source, subject only to the as-if rule.


cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: -fno-tree-cselim not working?

2007-10-26 Thread Ian Lance Taylor
skaller <[EMAIL PROTECTED]> writes:

> As I understand it volatile is typically used as a 'hint'
> to the compiler that there could be aliases it cannot see.
> This is independent of the use suggesting asynchronous changes
> in a hardware port for example, although the effect is the same.

Not really.  Volatile has a reasonably precise definition: it means
that memory accesses must be done precisely as they are written in the
program.  They may not be coalesced, and they may not be moved.


> So: accesses to any variable read or written
> whilst protected by a lock are ordered with respect
> to any other accesses protected by a lock in any
> other thread, whether or not the variable is aliased
> or volatile.

Yes.

> I'm deducing this from existing practice NOT any standards:
> in this circumstance, volatile is NOT needed.

That is correct: it's not needed if there is a lock, access, unlock
sequence.  In the example which started this brouhaha, there was no
unlock.


> That leaves open several questions. One is: what can
> we say about a write in thread A *prior* to a lock
> being set and released, and a read in thread B
> *after* a lock is set and released, where the read
> is certain to be following the write temporaly,
> for a variable NOT accessed during the exclusion period
> by either thread?

A correct mutex implementation is required to use compiler specific
magic to implement a memory barrier.  In gcc, this magic is implement
via an inline asm.  This memory barrier will impose the natural
ordering requirements.


> And in effect that means ALL variables must be synchronised,
> unless the compiler is exceedingly smart!

No, it means that the implementation of mutex must be magic.


> [BTW I think this sucks, the need to synchronise ALL memory
> on mutexing is far too heavy]

It can not be avoided, for the reasons you describe.

Ian


Re: -fno-tree-cselim not working?

2007-10-26 Thread Ian Lance Taylor
Andreas Schwab <[EMAIL PROTECTED]> writes:

> Ian Lance Taylor <[EMAIL PROTECTED]> writes:
> 
> > The above code happens to use pthread_mutex_trylock, but there is no
> > need for that.
> 
> pthread_mutex_trylock is special, because POSIX says it is a memory
> synchronisation point (see section 4.10 Memory Synchronization).

Sure.  But the argument that gcc is doing something wrong stands up
just fine even we just test a global variable.  The argument that gcc
is doing something wrong does not rely on the fact that the function
called is pthread_mutex_trylock.

Ian


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Andrew Haley
Bart Van Assche writes:
 > On 10/22/07, Andrew Haley  wrote:
 > 
 > > The core problem here seems to be that the "C with threads" memory
 > > model isn't sufficiently well-defined to make a determination
 > > possible.  You're assuming that you have no responsibility to mark
 > > shared memory protected by a mutex as volatile, but I know of nothing
 > > in the C standard that makes such a guarantee.  A prudent programmer
 > > will make conservative assumptions.
 > 

...

 > My opinion is that it should be possible to declare whether C/C++
 > code has acquire, release or acquire+release semantics. The fact
 > that code has acquire semantics means that no subsequent load or
 > store operations may be moved in front of that code, and the fact
 > that code has release semantics means that no preceding load or
 > store operations may be moved past that code. Adding definitions
 > for acquire and release semantics in pthread.h would help a
 > lot. E.g.  pthread_mutex_lock() should be declared to have acquire
 > semantics, and pthread_mutex_unlock() should be declared to have
 > release semantics.

Hmmm.  This is an interesting idea, but it sounds to me as though it's
somewhat at variance with what is proposed by the C++ threads working
group.  In any case, gcc will certainly implement whatever the
standards committees come up with, but that is probably two years
away.

Right now the question is whether or not gcc will produce thread-safe
code according to some memory model, rather than any specific details
about what that model should be.

IMO, we need to move rapidly towards tracking the proposed model from
the C++ threads working paper.  This would at least provide a
reasonably sane model that corresponds with most thread and kernel
programmers' understanding.

Andrew.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Ian Lance Taylor
"Bart Van Assche" <[EMAIL PROTECTED]> writes:

>   * As known the compiler may reorder function calls and assignments
> to non-volatile variables if the compiler can prove that the called
> function won't modify that variable. This becomes problematic if the
> variable is modified by more than one thread and the called function
> is a synchronization function, e.g. pthread_mutex_lock(). This kind of
> reordering is highly undesirable. This is why any variable that is
> shared over threads has to be declared volatile, even when using
> explicit locking calls.

What happens in practice is that pthread_mutex_lock and friends are
magic functions.  In gcc, this magic implemented using inline
assembler constructs.

Ian


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Ian Lance Taylor
Tomash Brechko <[EMAIL PROTECTED]> writes:

>   - the compiler should not introduce speculative stores to the shared
> objects.  This is what my original question was about.  I haven't
> read all the papers yet, so one thing is still unclear to me: it
> seems like atomic variables will be annotated as such
> (atomic).  But I found no proposal for annotation of
> non-atomic objects that are protected by the ordinary locks (like
> mutexes).  Will the compiler be forbiden to do all speculative
> stores, or how will it recognize shared objects as such?

In practice, gcc will provide a variable attribute to mark the
variable as atomic.

The language standard does not forbid speculative stores to non-atomic
objects.

Ian


Re: -fno-tree-cselim not working?

2007-10-26 Thread Andrew Haley
Ian Lance Taylor writes:

 > As I understand it, the draft C++0x memory model has acquire release
 > semantics for annotated variables.  Of course, it wouldn't help the
 > originalk test case unless the global variable was annotated.

Mmm, but one of the authors of the draft C++0x memory model tells me
that the controversial optimization gcc is performing is definitely
illegal under that model, regardless of how the variables are
annotated.  I haven't yet got deep enough into the working paper to be
able to point you at exactly where it says so.

Andrew.


Re: GCC 4.3 release schedule

2007-10-26 Thread Richard Guenther
On 10/26/07, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
> Richard Guenther wrote:
> > On 10/26/07, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
> >>
> >> Nov 14th - we'd like to start building F9 with a 4.3 compiler. Ideally
> >> we'd have a branch cut no later than that and starting to stabilize
> >> without much new code going in.
> >>
> >
> > I oppose to cut a branch without immediately releasing 4.3.0.  This just
> > doesn't work - just look at the history.  I also think that Nov 14th is 
> > _very_
> > optimistic (I personally was thinking about a (early) Q1 release).
> >
> >
>
> I got the impression from marks latest note that he was planing to cut a
> 4.3 branch when we got down to 100 P1s.
>
> >We're still in Stage 3 for GCC 4.3.  As before, I think a reasonable
> >target for creating the release branch is less than 100 open
> >regressions.
>
>
> This was where I was going with the nov 14th date as a latest target.
> So you are saying we shouldn't branch at this point?

Yes.  While I saw Marks status report I also saw us running away from
completing the 4.2.0 release into stage1 after the branch was cut.

If we think 4.3.0 is ready at the point we reach 100 open regressions
then we should release it.  There is no point in branching but not
releasing.  If we don't think 4.3.0 is ready at that point, then the 100
open regressions is the wrong metric.  (I'd rather use zero P1 bugs
as metric)

Given that both Novell and RedHat want to use 4.3 for their next
community release I think working towards the release and branching
and releasing at the same point will work out.

> I'd like to see some stabilization... Ie, not hunks of new functionality.

I agree.  There seem to be still some "late features" going in while
I'd rather would like people to spend their time on fixing bugs ...
(but it doesn't work that way ;)).  But technically stage3 is bugfixes
and documentation only - we just need to enforce this more strictly.

> Early Q1 for 4.3.0 works for us for a release. I had no idea any one
> else cared when 4.3 goes out.

Well, I'm pretty sure that we make the Q1 deadline, so I didn't bother
too much to try to put it in stone ;)

> > Note that we are building what-becomes-openSUSE 11.0 with current trunk
> > in parallel to 4.2 at the moment, switching to 4.3 is in the next weeks
> > (unless I get pushed back again ;)).
> >
> >
> It would be excellent to have both openSUSE and fedora pounding on 4.3
> at the same time.

Yes.  I think Ubuntu is on track for 4.3 as well, most likely Debian, too.

> >> Dec 14th - viability decision on using 4.3 as the F9 compiler.  There is
> >> a window here as late as Jan 14th, but the opinions will start forming
> >> in Dec. There shouldn't be any ABI changes from this point on.
> >> Feb 29th - Optimistic target date for a 4.3 release.  can slip as much
> >> as a  month,  but clearly the earlier the release happens the better.
> >>
> >
> > If we branch too early I bet we will not make even the Feb 29th date.
> >
> > How does targeting a release around Feb 1st sound (that is, without
> > branching before, a RC1 on Feb 1st, the release including branching
> > a week or two later)?  It looks like that would work for your constraints.
> >
> >
> having a RC by the beginning of february seems reasonable to me  :-) mid
> january would be even better.

Yeah - though I expect the usual holidays lack-of-interest.

> >> I don't recall seeing any other timeline requirements, does this seem
> >> like a reasonable target schedule? Once a decision is made to use 4.3 by
> >> mid-jan, it becomes very difficult to back out if something happens to
> >> the release date, so it becomes quite important that the final criteria
> >> is well understood by then and appears reachable.  If something
> >> unforeseen happens late in the cycle to delay the release, its also
> >> important that we can get some sort of steering committee agreement on
> >> what to do so we don't have some sort of evil gcc offspring as happened
> >> once before.
> >>
> >
> > Once again - GCC development and release planning doesn't work this way.
> > But instead you (and me and others) are supposed to make the release 
> > "happen"
> > by fixing bugs and fulfilling the release criteria.  A "date" wasn't a
> > release criteria
> > in the past and I don't think it should become one (in fact, the only
> > time a release
> > "date" would be a welcome criteria is if the release is a throw-away
> > and releasing
> > would allow us to work on next stage1 again).
> >
> >
> I understand, but without some sort of date guideline, you cant plan on
> using as yet unreleased compiler. It buggers up the release cycle. I
> don't propose the date as a hard release criteria, but as a target that
> we'd like to work towards. So what I'd like to see is what the final
> release requirements are, and then along they way we can monitor the
> current status and if we are falling behind,  can try to apply more
> resource to it to get back on track.
>

Re: -fno-tree-cselim not working?

2007-10-26 Thread Ian Lance Taylor
skaller <[EMAIL PROTECTED]> writes:

> On Fri, 2007-10-26 at 07:58 -0700, Ian Lance Taylor wrote:
> > skaller <[EMAIL PROTECTED]> writes:
> > 
> > > As I understand it volatile is typically used as a 'hint'
> > > to the compiler that there could be aliases it cannot see.
> > > This is independent of the use suggesting asynchronous changes
> > > in a hardware port for example, although the effect is the same.
> > 
> > Not really.  Volatile has a reasonably precise definition: it means
> > that memory accesses must be done precisely as they are written in the
> > program.  They may not be coalesced, and they may not be moved.
> 
> I understand that's the common meaning .. but the semantics
> aren't specified for ISO C/C++.

Sure they are.  In the C99 standard look at the definition of sequence
points (5.1.2.3) and the definition of volatile (6.7.3).


> > > [BTW I think this sucks, the need to synchronise ALL memory
> > > on mutexing is far too heavy]
> > 
> > It can not be avoided, for the reasons you describe.
> 
> Of course it can be avoided easily if the memory model
> allowed for local synchronisation sets, so the real problem
> appears to be that Posix doesn't provide proper synchronisation
> control.

Or that POSIX only provides heavyweight synchronization.

As I understand it, the draft C++0x memory model has acquire release
semantics for annotated variables.  Of course, it wouldn't help the
originalk test case unless the global variable was annotated.

Ian


RE: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Dave Korn
On 26 October 2007 16:51, Tomash Brechko wrote:

> On Fri, Oct 26, 2007 at 08:32:07 -0700, Ian Lance Taylor wrote:
>> The language standard does not forbid speculative stores to non-atomic
>> objects.

>   * Disallow speculative stores on potentially shared objects.
>   * Disallow reading and re-writing of unrelated objects. (For
> instance, if you have struct S{ char a,b; }; it is not OK to
> modify b by reading in the whole struct, bit-twiddling b, and
> writing the whole struct because that would interfere with
> another thread that is trying to write to a.)

  I don't see how that second one is possible in the most general case.  Some
cpus don't have all widths of access mode; and how could it possibly work for
sub-world bitfields?  (Or are those just to be considered 'related'?)

> So, will "potentially shared objects" be marked as such explicitly by
> the programmer, or is it a compiler job to identify them?

  Well, the compiler can certainly do some of that (cf. escape analysis), but
it's always going to have to be vastly more conservative than it could be if
the programmer directs it with annotations.  As far as I can see, we'd either
need some very thorough LTO, or we'd just have to treat /all/ globals this way
indiscriminately.

  Aren't we about to reinvent -fvolatile, with all the hideous performance
losses that that implies?

cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: GCC 4.3 release schedule

2007-10-26 Thread Dennis Clarke

> On 10/26/07, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
>> Richard Guenther wrote:
>> > On 10/26/07, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
>> >>

>
> ... when we think it's ready.  It doesn't help anyone to declare victory
> and release 4.3.0 when it still miscompiles the kernel (not that I know
> if it does).  Warm fuzzyness for PMs put aside.

At the risk of annoying you Red Hat Linux guys ( and Linux people in general
) you may be surprised to hear that there are problems for UNIX(tm) users
out there.  Now I have tried and failed to get a successful bootstrap build
of GCC 4.2.2 on Solaris 8 ( Sparc or x86 ) and on Solaris 10. When I look at
the Build status page I see no one has posted a result there for GCC 4.2.2 :

  Please see : http://gcc.gnu.org/gcc-4.2/buildstat.html

I was able to get a decent build with GCC 4.2.1 however :

 And see : http://gcc.gnu.org/ml/gcc-testresults/2007-08/msg00318.html

Exact same machine with exact same environment can not build GCC 4.2.2.

Now then, you seem to be discussing GCC 4.3 when GCC 4.2.x still does not
build correctly on a highly standards compliant UNIX platform.  Am I reading
this correctly ?  If not .. then please educate me if you can. I would like
to at least see GCC 4.2.2 bootstrap out of the box before flailing forwards
to GCC 4.3.x.

-
Dennis Clarke



Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread skaller

On Fri, 2007-10-26 at 16:05 +0200, Bart Van Assche wrote:
> On 10/22/07, Andrew Haley  wrote:

> I agree that according to the C and C++ language standards, any
> variable shared over threads should be declared volatile.

No, they say nothing about multi-threaded programs.

> My opinion is that it should be possible to declare whether C/C++ code
> has acquire, release or acquire+release semantics. The fact that code
> has acquire semantics means that no subsequent load or store
> operations may be moved in front of that code, and the fact that code
> has release semantics means that no preceding load or store operations
> may be moved past that code. Adding definitions for acquire and
> release semantics in pthread.h would help a lot. E.g.
> pthread_mutex_lock() should be declared to have acquire semantics, and
> pthread_mutex_unlock() should be declared to have release semantics.
> Maybe it is a good idea to add the following function attributes in
> gcc: __attribute__((acquire)) and __attribute__((release)) ? A
> refinement of these attributes would be to allow to specify not only
> the acquire/release attributes, but also the memory locations to which
> the acquire and release apply (pthreads synchronization functions
> always apply to all memory locations).

That sounds quite interesting!

But, now you should continue this idea. You are suggesting
primitives to attach to 'code' but  then only say 'functions'.
What about plain old statements? Expressions?

Now you need to specify a calculus for these properties.

I think this idea is really hot .. it's so much simpler
and fine-grained that just having a mutex.


-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net


RE: -fno-tree-cselim not working?

2007-10-26 Thread Andrew Haley
Dave Korn writes:
 > On 26 October 2007 15:59, Ian Lance Taylor wrote:
 > 
 > > Andreas Schwab <[EMAIL PROTECTED]> writes:
 > > 
 > >> Ian Lance Taylor <[EMAIL PROTECTED]> writes:
 > >> 
 > >>> The above code happens to use pthread_mutex_trylock, but there is no
 > >>> need for that.
 > >> 
 > >> pthread_mutex_trylock is special, because POSIX says it is a memory
 > >> synchronisation point (see section 4.10 Memory Synchronization).
 > > 
 > > Sure.  But the argument that gcc is doing something wrong stands up
 > > just fine even we just test a global variable.  The argument that gcc
 > > is doing something wrong does not rely on the fact that the function
 > > called is pthread_mutex_trylock.
 > 
 >   Indeed; as I understand it, what the argument that gcc is doing
 > something wrong relies on is the incorrect assumption that *all*
 > variables ought to behave like volatile ones, i.e. have an exact
 > one-to-one relationship between loads and stores as written in the
 > HLL and actual machine load/store operations.

No, it doesn't, it relies on the fact that POSIX allows shared access
to non-volatile memory as long as mutexes are used to enforce mutual
exclusion.  POSIX does not require memory used in such a way to be
declared volatile.  The problem with your argument is that you're
looking at ISO C but not at POSIX threads.

The new C++ threads memory model is intended by its authors to solve
this mess.

Andrew.


Re: GCC 4.3 release schedule

2007-10-26 Thread Mark Mitchell
Andrew MacLeod wrote:

> we can at least make projected dates known so we have something firmer
> than "at some point in the future" :-)

The canonical rule of project management is: "Features, Schedule, Cost:
Pick At Most 2." :-)  In other words, you can decide what features you
want and when you want them by -- but be prepared to pay for a vast
team.  Or, you can decide what you want to pay, and when you want it --
but be prepared to get whatever features are done by then with the team
you paid for.  In the case of GCC, it's worse than that since we're not
all interested in the same things, or being in any way centrally directed.

As RM, I try to take into account what I know about when distributors
will be applying effort, but I must absolutely avoid in any way tilting
the FSF release process towards the needs of one distributor, possibly
at the expense of another.  I don't think it's appropriate for us to set
a schedule tailored to any one distributor's needs -- and there are a
lot more distributors than just Red Hat and SuSE, so I'd say that even
if you were on the same schedule.  But, I certainly do think it's
helpful for a contributor to tell us when resources might be available
and I appreciate you sharing that information.

If you're interested in driving the release to a particular date, the
best thing you can do is to go clear out the P1s in bugzilla and then
bash out a few P2s.  (I've noticed Red Hat folks doing some of that
already, thanks!)  I'd imagine that the dates you want to hit would be
achievable if you, Jakub, Jason, etc. all work on issues.

I've found schedules for GCC to be very hard to predict.  As I said in
my status report, our practice has been to cut the release branch when
we reach 100 regressions, and release 2-4 months after that point,
depending on quality on the branch.  To be honest, I'd rather wait
longer to make the branch -- but there tends to be intense pressure in
the developer community to make a branch so we can get on to the next
round of major features.  In any case, after we make the branch, it's in
regression-only mode, so stability tends to be quite good, though
dot-zero releases are, after all, dot-zero releases.

Thanks,

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
On Fri, Oct 26, 2007 at 08:32:07 -0700, Ian Lance Taylor wrote:
> The language standard does not forbid speculative stores to non-atomic
> objects.

That's why there's a proposal to refine the language.  I was meaning
the folloing:

  http://www.artima.com/cppsource/threads_meeting.html:

  Hans Boehm and Herb Sutter both presented very detailed and
  well-thought out memory models. Their differences are subtle and
  important, but in broad strokes, both proposals paint a similar
  picture. In particular, both proposals:

  * Specify a set of atomic (aka, interlocked) primitive operations.
  * Explicitly specify the ordering constraints on atomic reads and writes.
  * Specify the visibility of atomic writes.
  * Disallow speculative stores on potentially shared objects.
  * Disallow reading and re-writing of unrelated objects. (For
instance, if you have struct S{ char a,b; }; it is not OK to
modify b by reading in the whole struct, bit-twiddling b, and
writing the whole struct because that would interfere with
another thread that is trying to write to a.)


So, will "potentially shared objects" be marked as such explicitly by
the programmer, or is it a compiler job to identify them?


-- 
   Tomash Brechko


Re: GCC 4.3 release schedule

2007-10-26 Thread Richard Guenther
On 10/26/07, Dennis Clarke <[EMAIL PROTECTED]> wrote:
>
> > On 10/26/07, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
> >> Richard Guenther wrote:
> >> > On 10/26/07, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
> >> >>
>
> >
> > ... when we think it's ready.  It doesn't help anyone to declare victory
> > and release 4.3.0 when it still miscompiles the kernel (not that I know
> > if it does).  Warm fuzzyness for PMs put aside.
>
> At the risk of annoying you Red Hat Linux guys ( and Linux people in general
> ) you may be surprised to hear that there are problems for UNIX(tm) users
> out there.  Now I have tried and failed to get a successful bootstrap build
> of GCC 4.2.2 on Solaris 8 ( Sparc or x86 ) and on Solaris 10. When I look at
> the Build status page I see no one has posted a result there for GCC 4.2.2 :
>
>   Please see : http://gcc.gnu.org/gcc-4.2/buildstat.html
>
> I was able to get a decent build with GCC 4.2.1 however :
>
>  And see : http://gcc.gnu.org/ml/gcc-testresults/2007-08/msg00318.html
>
> Exact same machine with exact same environment can not build GCC 4.2.2.
>
> Now then, you seem to be discussing GCC 4.3 when GCC 4.2.x still does not
> build correctly on a highly standards compliant UNIX platform.  Am I reading
> this correctly ?

Yes.  This is because the interest in 4.2.x is much lower than in 4.3.0
right now.

> If not .. then please educate me if you can. I would like
> to at least see GCC 4.2.2 bootstrap out of the box before flailing forwards
> to GCC 4.3.x.

Patches welcome.  Certainly there are targets that are less maintained
(and tested) than the *-linux targets.  But without infinite resources
we cannot do anything about this.  Thus, in the case of Solaris - talk to Sun.

Richard.


Re: GCC 4.3 release schedule

2007-10-26 Thread Dennis Clarke

> Mark Mitchell wrote:
>> As I said in
>> my status report, our practice has been to cut the release branch when
>> we reach 100 regressions, and release 2-4 months after that point,
>> depending on quality on the branch.  To be honest, I'd rather wait
>> longer to make the branch -- but there tends to be intense pressure in
>> the developer community to make a branch so we can get on to the next
>> round of major features.
>

   Is "correctness" a feature ?

   I would like to see a nice clean GCC 4.2.x before GCC 4.3.zero even gets
thought of.  Why would one simply branch towards the next release when
the previous one still needs some work?  To appease sales people and
developers making noises for features?

> I don't want to start a flame-fest, but perhaps we could reconsider the
> release-branching criteria.

  I will read intently.

Dennis Clarke



Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Samuel Tardieu
> "Sam" == Samuel Tardieu <[EMAIL PROTECTED]> writes:

Sam> In the following example, is the access to "Shared" considered
Sam> unsynchronized even though what looks like a proper lock is used
Sam> around it?

Call to Always_Unlock was incorrect in the previous example, a fixed
one exhibiting the bug is now at http://pastebin.com/f1bc7ba32

  Sam
-- 
Samuel Tardieu -- [EMAIL PROTECTED] -- http://www.rfc1149.net/



RE: -fno-tree-cselim not working?

2007-10-26 Thread Dave Korn
On 26 October 2007 17:11, Andrew Haley wrote:

>  >   Perhaps I've jumped into the wrong one of the two near-identical
>  > threads we have going on this, but my understanding of the original
>  > complaint was that gcc writes to the variable regardless of whether
>  > it needs an update or not, and that this is a problem in the case
>  > where one thread is accessing *without* using the mutex that the
>  > other one *is* using.
> 
> No, that is not the problem.
> 
> The problem is code like this:
> 
> int counter;
> 
> ...
> 
>   if (we_hold_counters_mutex)
> counter++;
> 
> This is legal POSIX threads code: counter is not accessed when we do
> not hold the mutex. 

  Well, that's the bone of contention.  I suggest that you cannot infer
"counter is not accessed" when the condition is false, according to the C
language spec, because there are two different semantics of the word
"accessed" here: as-if accessed, as per the ideal machine definition of C, and
actually accessed, as in what actually happens in the underlying code.  The C
standard only makes guarantees about the as-if accesses.

  We need to take more care in this discussion to be clear about which sense
of the word "access" we mean at any time, otherwise anything we say is going
to be underdefined and ambiguous.

> According to POSIX we do not have to declare
> volatile memory that we only access when we hold a mutex.

  Is the problem that POSIX doesn't make the distinction between as-if and
actual behaviour that is such an essential part of the C standard?

> gcc turns this code into
> 
>   tmp = counter;
>   if (we_hold_the_counters_mutex)
> tmp++;
>   counter = tmp;

  Yes, I understand that perfectly well.  That is actually what I described in
my first quoted paragraph above - but I was referring to /actually/ accessed,
not /as-if/ accessed.

> This introduces a data race that is not in the user's program.

  Do you mean that it's not in the as-if idealised version of the program that
the compiler constructs, or that it's not in the high-level source?  A


cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: -fno-tree-cselim not working?

2007-10-26 Thread skaller

On Fri, 2007-10-26 at 08:27 -0700, Ian Lance Taylor wrote:
> skaller <[EMAIL PROTECTED]> writes:
> 

> > I understand that's the common meaning .. but the semantics
> > aren't specified for ISO C/C++.
> 
> Sure they are.  In the C99 standard look at the definition of sequence
> points (5.1.2.3) and the definition of volatile (6.7.3).

6.7.3 volatile bullet is non-normative. As pointed
out in previous post: volatile accesses are observable.
What 6.7.3 says is not a definition, but something which can
be deduced from their status as observables.

The key point which makes this waffle is the bit that says
'what constitutes an access is implementation defined'.

-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net


RE: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread skaller

On Fri, 2007-10-26 at 16:24 +0100, Dave Korn wrote:
> On 26 October 2007 16:15, Robert Dewar wrote:
> 
> > One problem at the language standards level is that you can't easily
> > talk about loads and stores, since you are defining an as-if semantic
> > model, and if you make a statement about loads and stores, any other
> > sequence which behaves as if that sequence were obeyed is allowed. 
> 
>   Well, that's precisely the problem - specifically in the context of
> memory-mapped I/O registers - that volatile was invented to solve.  It may
> never have been clearly defined in the formal language of the specs, but I
> thought it was pretty clear in intent: the compiler will emit exactly one
> machine load/store operation for any rvalue reference/lvalue assignment
> (respectively) in the source, at the exact sequence point in the generated
> code corresponding to the location of the reference in the source.  Any other
> variable may be accessed more or fewer times than is written, and may be
> accessed at places other than exactly where the reference is written in the
> source, subject only to the as-if rule.

Volatile semantics aren't defined, you have it backwards.

Bart hinted at the way it really works: it isn't a definition,
and it isn't a specification: volatile is part of the 
*conformance* model. Volatile accesses are *observable*.

So when the standard says of:

int a = 1;
int b = 2;
printf("%d %d",a,b);

that a is initialised then b, the compiler can ignore the standard,
because there is no observable way to tell what the ordering is,
except that it has to be complete before the print happens.

If a,b above were volatile .. then the ordering is directly
observable, so the compiler is constrained to obey the
rules.

The point is -- there is no new rule here, and no definition
of what a volatile semantics is: volatile variables have
the SAME 'semantics' as any other variable. If I write:

int a = 1;
printf("%d", a);
int b = 2;
printf("a,b);

then it is just the same as if a,b were volatile.
In fact I could put

double x=sin(1.0);

instead of the printf .. sin is a library function.
This means a debugger with a breakpoint on the sin can be
used to bug out the compiler as non-conforming if 'a' isn't
set (or, if b IS set) .. but print statements are easier :)

Stick a user defined function in there, which itself isn't
observable .. and all bets are off again.


-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net


RFC: Creating a live, all-encompassing architectural document for GCC

2007-10-26 Thread Diego Novillo
This idea is still very raw in my mind, so apologies in advance for
being deliberately vague.  For the last few weeks I have been thinking
on ways to address the sorry state of our internal documentation.

We all agree that none of us has global knowledge of all aspects of
the compiler.  It's just impossible.  So, we have the collective
knowledge distributed all over the place but it is pretty hard for
someone to navigate the compiler without talking to N different
people.

So, I think the problem goes a bit beyond mere documentation of how a
module works at a high level.  I would like to have a navigable
document that also describes the flow of things, interfaces and
helpers.  Starting at main.c:main() and ending at toplev.c:finalize().

The key properties that I'm after are:

- Navigable.

>From high-level details to low-level implementation of a single file.
This would be a set of browseable hierarchies that offer different
views of the documentation.  For instance, one view would be a
decomposition of modules, each of which would offer a view of
submodules and such.
Another view could focus on the compilation flow.

- Live and easy to modify.

It should be easy for an individual maintainer (or even user) to go in
and modify parts of the document that are incomplete/missing/wrong.
This and navigability suggest a wikipedia-like approach.  We even have
the beginnings of some of this in the wiki, so I would like to build
on that.


- Close correspondence to mainline.

This is where it gets hard.  We need to have a way of enforcing code
updates that change internal or external API properties to be
reflected in the document.  With this I don't mean that every single
patch should be accompanied with a documentation change.  However, if
a patch refactors a module and its internal interfaces are changed,
then the patch should be accompanied with a change to the
documentation.


- Transfer from document to code

The documentation for individual modules and files should be linked to
the actual source code.  Perhaps this could be automatically generated
with tools like javadoc or doxygen.

The high-level description of algorithms, strategies, heuristics,
interfaces, usage, etc live on the document and we provide links to
the automatically generated documentation or SVN live sources.  This
would provide more low-level documentation in the form of individual
function documents and the usual things we request in each file.


So, I think my inclination is to provide this document as a wiki.  The
current wiki should provide all the support we need.  We can create
different views and organization incrementally, it's easy for people
to edit it, and we could put guards around if it started being abused.

My ultimate goal is to facilitate the transfer of knowledge.  When
someone new wants to start working on GCC, we should be able to point
them to this document and say "here, drink from this hose".


Thanks.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread skaller

On Fri, 2007-10-26 at 20:17 +0400, Tomash Brechko wrote:

> cases.  Only globals, or locals which address was passed to some
> function, should be treated specially.  

err .. what about the heap??

And what do you do if you do not KNOW what the storage class is,
which is the case 99.99% of the time in C++ member functions?

-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Michael Matz
Hi,

On Fri, 26 Oct 2007, Tomash Brechko wrote:

> It was already said that instead of disallowing all optimization with
> volatile, the optimization itself may be made a bit differently.
> Besides, the concern that it will hurt performance at large is a bit
> far-stretched.  You still may speculatively store to automatic var for
> which address was never taken, and this alone covers 50%--80% of
> cases.

Both, the assessment of far-stretchedness and these numbers seem to be 
invented ad hoc.  The latter is irrelevant (it's not interesting how many 
cases there are, but how important those cases which occur are, for some 
metric, let's say performance).  And the former isn't true, i.e. the 
concern is not far-stretched.  For 456.hmmer for instance it is crucial 
that this transformation happens, the basic situation looks like so:

int f(int M, int *mc, int *mpp, int *tpmm, int *ip, int *tpim, int *dpp,
  int *tpdm, int xmb, int *bp, int *ms)
{
  int k, sc;
  for (k = 1; k <= M; k++)
{
  mc[k] = mpp[k-1]   + tpmm[k-1];
  if ((sc = ip[k-1]  + tpim[k-1]) > mc[k])  mc[k] = sc;
  if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;
  if ((sc = xmb  + bp[k]) > mc[k])  mc[k] = sc;
  mc[k] += ms[k];
}
}

Here the conditional stores to mc[k] are better be implemented as 
conditional moves, otherwise you loose about 25% performance on some 
platforms.  See PR27313, for which I implemented this transformation on 
the tree level.  A similar transformation happens already since much 
longer time by the RTL if-cvt.  All of these are currently completely 
valid transformations, so they could only be redefined as invalid by some 
other memory model.  Such other memory model has to take into account the 
performance implications, which do exist.  Contrary to what some 
proponents of a different model claim.  Certainly some suggestions for 
another memory model look quite similar to considering all non-automatic 
objects as volatile, at which point the question should be allowed why not 
simply using 'volatile'.

> Only globals, or locals which address was passed to some
> function, should be treated specially.  Also, for the case
> 
>   void
>   f(int set_v, int *v)
>   {
> if (set_v)
>   *v = 1;
>   }
> 
> there's no load-maybe_update-store optimization, so there won't be
> slowdown for such cases also (BTW, how this case is different from
> when v is global?).

The difference is, that 'v' might be zero, hence *v could trap, hence it 
can't be moved out of its control region.  If you somehow could determine 
that *v can't trap (e.g. by having a dominating access to it already) then 
the transformation will be done.


Ciao,
Michael.


RE: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Richard Kenner
 I thought it was pretty clear in intent: the compiler will emit
 exactly one machine load/store operation for any rvalue
 reference/lvalue assignment (respectively) in the source, at the exact
 sequence point in the generated code corresponding to the location of
 the reference in the source.

The problem is that "one machine load/store operation" and "any rvalue"
aren't precisely-defined terms.  We've had numerous discussions on this
list before about how one might want to define them.


RE: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Michael Matz
Hi,

On Sat, 27 Oct 2007, skaller wrote:

> The point is -- there is no new rule here, and no definition
> of what a volatile semantics is: volatile variables have
> the SAME 'semantics' as any other variable. If I write:
> 
>   int a = 1;
>   printf("%d", a);
>   int b = 2;
>   printf("a,b);
> 
> then it is just the same as if a,b were volatile.

Not at all.  As neither a nor b are global memory, printf() (or any other 
function) could not access them, hence no observer could determine if or 
if not 'b' is already set.  In contrast to when a and b were volatile.


Ciao,
Michael.


RE: -fno-tree-cselim not working?

2007-10-26 Thread Dave Korn
On 26 October 2007 17:28, Andrew Haley wrote:

> Richard Guenther writes:
>  > >
>  > > This is legal POSIX threads code: counter is not accessed when we do
>  > > not hold the mutex.  According to POSIX we do not have to declare
>  > > volatile memory that we only access when we hold a mutex.
>  >
>  > I hope we're not trying to support such w/o volatile counter.
> 
> I think we have to: not just for POSIX, but for the Linux kernel too.
> 
>  > Whatever POSIX says, this would pessimize generic code too much.
> 
> We don't have to do it for non-threaded code.

  I certainly won't object to any move to prohibit the
read-conditional-add-write (and related) optimisation(s) when compiling with
an option that explicitly specifies that we are compiling multi-threaded code.

cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: -fno-tree-cselim not working?

2007-10-26 Thread Richard Guenther
On 10/26/07, Andrew Haley <[EMAIL PROTECTED]> wrote:
> Dave Korn writes:
>  > On 26 October 2007 16:27, Andrew Haley wrote:
>  >
>  > > Dave Korn writes:
>  > >  > On 26 October 2007 15:59, Ian Lance Taylor wrote:
>  >
>  > >  > > Sure.  But the argument that gcc is doing something wrong stands up
>  > >  > > just fine even we just test a global variable.  The argument that 
> gcc
>  > >  > > is doing something wrong does not rely on the fact that the function
>  > >  > > called is pthread_mutex_trylock.
>  > >  >
>  > >  >   Indeed; as I understand it, what the argument that gcc is doing
>  > >  > something wrong relies on is the incorrect assumption that *all*
>  > >  > variables ought to behave like volatile ones, i.e. have an exact
>  > >  > one-to-one relationship between loads and stores as written in the
>  > >  > HLL and actual machine load/store operations.
>  > >
>  > > No, it doesn't, it relies on the fact that POSIX allows shared access
>  > > to non-volatile memory as long as mutexes are used to enforce mutual
>  > > exclusion.  POSIX does not require memory used in such a way to be
>  > > declared volatile.  The problem with your argument is that you're
>  > > looking at ISO C but not at POSIX threads.
>  >
>  >   Perhaps I've jumped into the wrong one of the two near-identical
>  > threads we have going on this, but my understanding of the original
>  > complaint was that gcc writes to the variable regardless of whether
>  > it needs an update or not, and that this is a problem in the case
>  > where one thread is accessing *without* using the mutex that the
>  > other one *is* using.
>
> No, that is not the problem.
>
> The problem is code like this:
>
> int counter;
>
> ...
>
>   if (we_hold_counters_mutex)
> counter++;
>
> This is legal POSIX threads code: counter is not accessed when we do
> not hold the mutex.  According to POSIX we do not have to declare
> volatile memory that we only access when we hold a mutex.

I hope we're not trying to support such w/o volatile counter.  Whatever
POSIX says, this would pessimize generic code too much.

Richard.


Re: GCC 4.3 release schedule

2007-10-26 Thread David Daney

Mark Mitchell wrote:

As I said in
my status report, our practice has been to cut the release branch when
we reach 100 regressions, and release 2-4 months after that point,
depending on quality on the branch.  To be honest, I'd rather wait
longer to make the branch -- but there tends to be intense pressure in
the developer community to make a branch so we can get on to the next
round of major features.


I don't want to start a flame-fest, but perhaps we could reconsider the 
release-branching criteria.


As Richard indicated, some (including me) might prefer delaying the 
branch until we are much closer to the release.  It doesn't seem ideal 
to re-live the 4.2 schedule ad infinitum.


As for what the actual date for release should be, I have no preference.

David Daney


RE: -fno-tree-cselim not working?

2007-10-26 Thread Andrew Haley
Dave Korn writes:
 > On 26 October 2007 16:27, Andrew Haley wrote:
 > 
 > > Dave Korn writes:
 > >  > On 26 October 2007 15:59, Ian Lance Taylor wrote:
 > 
 > >  > > Sure.  But the argument that gcc is doing something wrong stands up
 > >  > > just fine even we just test a global variable.  The argument that gcc
 > >  > > is doing something wrong does not rely on the fact that the function
 > >  > > called is pthread_mutex_trylock.
 > >  >
 > >  >   Indeed; as I understand it, what the argument that gcc is doing
 > >  > something wrong relies on is the incorrect assumption that *all*
 > >  > variables ought to behave like volatile ones, i.e. have an exact
 > >  > one-to-one relationship between loads and stores as written in the
 > >  > HLL and actual machine load/store operations.
 > > 
 > > No, it doesn't, it relies on the fact that POSIX allows shared access
 > > to non-volatile memory as long as mutexes are used to enforce mutual
 > > exclusion.  POSIX does not require memory used in such a way to be
 > > declared volatile.  The problem with your argument is that you're
 > > looking at ISO C but not at POSIX threads.
 > 
 >   Perhaps I've jumped into the wrong one of the two near-identical
 > threads we have going on this, but my understanding of the original
 > complaint was that gcc writes to the variable regardless of whether
 > it needs an update or not, and that this is a problem in the case
 > where one thread is accessing *without* using the mutex that the
 > other one *is* using.

No, that is not the problem.

The problem is code like this:

int counter;

...

  if (we_hold_counters_mutex)
counter++;

This is legal POSIX threads code: counter is not accessed when we do
not hold the mutex.  According to POSIX we do not have to declare
volatile memory that we only access when we hold a mutex.

gcc turns this code into

  tmp = counter;
  if (we_hold_the_counters_mutex)
tmp++;
  counter = tmp;

This introduces a data race that is not in the user's program.

Andrew.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Samuel Tardieu
On 26/10, Robert Dewar wrote:

| Of course in Ada there is a clear notion of threads semantic, and
| a clear definition of what the meaning of code is in the presence
| of threads, so the specific situation discussed here is easy to
| deal with (though Ada takes the view that unsychronized shared access to
| non-atomic or non-volatile data from separate threads has undefined
| effects).

In the following example, is the access to "Shared" considered
unsynchronized even though what looks like a proper lock is used
around it?


package P is

   Shared : Natural := 0;

   procedure Maybe_Increment;

end P;


package body P is

   protected Lock is
  procedure Maybe_Lock (Locked : out Boolean);
  procedure Always_Unlock;
   private
  Is_Locked : Boolean := False;
   end Lock;

   protected body Lock is

  procedure Always_Unlock is
  begin
 Is_Locked := False;
  end Always_Unlock;

  procedure Maybe_Lock (Locked : out Boolean) is
  begin
 Locked:= not Is_Locked;
 Is_Locked := True;
  end Maybe_Lock;

   end Lock;

   procedure Maybe_Increment is
  L : Boolean;
   begin
  Lock.Maybe_Lock (L);
  if L then
 Shared := Shared + 1;
  end if;
  Lock.Always_Unlock;
   end Maybe_Increment;

end P;

By naively reading the code, I would assume that if two tasks were to
call Maybe_Increment once, after completion of those tasks Shared would
contain either 1 or 2, depending on whether they both got the lock in
turn or if only one of them got it.

However, if you look at the x86 code for Maybe_Increment (-O3
-fomit-frame-pointer -fno-inline), you'll see:

 1   p__maybe_increment:
 2   .LFB11:
 3 subl$12, %esp
 4 .LCFI6:
 5 movl$p__lock, %eax
 6 callp__lock__maybe_lockP
 7 cmpb$1, %al
 8 movlp__shared, %eax <=== unconditional load
 9 sbbl$-1, %eax   <=== conditional +1
10 movl%eax, p__shared <=== unconditional store
11 movl$p__lock, %eax
12 addl$12, %esp
13 jmp p__lock__always_unlockP

Note lines 8 to 10: on a multiprocessor system with both tasks running at
the same time on different processors, you can end up with Shared being
zero after the two tasks have ended (for example if the task getting the
lock runs one or two instructions ahead the one without the lock).

  Sam



RE: -fno-tree-cselim not working?

2007-10-26 Thread Dave Korn
On 26 October 2007 16:27, Andrew Haley wrote:

> Dave Korn writes:
>  > On 26 October 2007 15:59, Ian Lance Taylor wrote:

>  > > Sure.  But the argument that gcc is doing something wrong stands up
>  > > just fine even we just test a global variable.  The argument that gcc
>  > > is doing something wrong does not rely on the fact that the function
>  > > called is pthread_mutex_trylock.
>  >
>  >   Indeed; as I understand it, what the argument that gcc is doing
>  > something wrong relies on is the incorrect assumption that *all*
>  > variables ought to behave like volatile ones, i.e. have an exact
>  > one-to-one relationship between loads and stores as written in the
>  > HLL and actual machine load/store operations.
> 
> No, it doesn't, it relies on the fact that POSIX allows shared access
> to non-volatile memory as long as mutexes are used to enforce mutual
> exclusion.  POSIX does not require memory used in such a way to be
> declared volatile.  The problem with your argument is that you're
> looking at ISO C but not at POSIX threads.

  Perhaps I've jumped into the wrong one of the two near-identical threads we
have going on this, but my understanding of the original complaint was that
gcc writes to the variable regardless of whether it needs an update or not,
and that this is a problem in the case where one thread is accessing *without*
using the mutex that the other one *is* using.  The false assumption is that
the code sequence "if (cond) var = value" will only touch var if cond is true
- this assumption is only valid for volatile variables.

cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
On Fri, Oct 26, 2007 at 17:00:28 +0100, Dave Korn wrote:
> >   * Disallow speculative stores on potentially shared objects.
> >   * Disallow reading and re-writing of unrelated objects. (For
> > instance, if you have struct S{ char a,b; }; it is not OK to
> > modify b by reading in the whole struct, bit-twiddling b, and
> > writing the whole struct because that would interfere with
> > another thread that is trying to write to a.)
> 
>   I don't see how that second one is possible in the most general case.  Some
> cpus don't have all widths of access mode;

>From http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf:

  Fortunately, the original motivation for this lax specification
  seems to stem from machine architectures that did not support
  byte-wide stores.  To our knowledge, no such architectures are still
  in wide-spread multiprocessor use.


> and how could it possibly work for sub-world bitfields?  (Or are
> those just to be considered 'related'?)

How mutex-protected, or even atomic access to bit-fields could
possibly work?  Yes, they are related, or rather do not constitute a
separate object, but belong to one common.


>   Aren't we about to reinvent -fvolatile, with all the hideous performance
> losses that that implies?

It was already said that instead of disallowing all optimization with
volatile, the optimization itself may be made a bit differently.
Besides, the concern that it will hurt performance at large is a bit
far-stretched.  You still may speculatively store to automatic var for
which address was never taken, and this alone covers 50%--80% of
cases.  Only globals, or locals which address was passed to some
function, should be treated specially.  Also, for the case

  void
  f(int set_v, int *v)
  {
if (set_v)
  *v = 1;
  }

there's no load-maybe_update-store optimization, so there won't be
slowdown for such cases also (BTW, how this case is different from
when v is global?).


-- 
   Tomash Brechko


Re: GCC 4.3 release schedule

2007-10-26 Thread Richard Guenther
On 10/26/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:
>
> I've found schedules for GCC to be very hard to predict.  As I said in
> my status report, our practice has been to cut the release branch when
> we reach 100 regressions, and release 2-4 months after that point,
> depending on quality on the branch.  To be honest, I'd rather wait
> longer to make the branch -- but there tends to be intense pressure in
> the developer community to make a branch so we can get on to the next
> round of major features.  In any case, after we make the branch, it's in
> regression-only mode, so stability tends to be quite good, though
> dot-zero releases are, after all, dot-zero releases.

To jump in on the last fact - dot-zero releases are dot-zero releases - it
makes sense to expose the branch to wider testing by, at branching time,
exposing a dot-zero release to the public ;)

And I seriously dispute that branching and waiting has ever made the
branch of better quality just because we branched and waited.  Instead
the opposite is true - developer ressources are dragged away to work
on their stage1 projects (that is true for myself).

I'd rather take the make the dot-zero release approach while branching
and count on interested people fixing bugs on the branch after the
dot-zero release.  This way if nobody is interested on a particular
release series then we can declare the dot-zero release final - otherwise
we'd do more releases from the  branch anyway.

Which still leaves us with the problem of setting criteria for releasing a
dot-zero.  Being it 100 regressions or zero P1 bugs or whatever.  Early
testing certainly helps here (sofar we are doing build-testing only, but
I expect to put the built binaries in "production" soon), but there are
still serious problems with 4.3 at the moment, like sorting out the libstdc++
API mess.

Thanks,
Richard.


Re: -fno-tree-cselim not working?

2007-10-26 Thread Andrew Haley
Richard Guenther writes:
 > >
 > > This is legal POSIX threads code: counter is not accessed when we do
 > > not hold the mutex.  According to POSIX we do not have to declare
 > > volatile memory that we only access when we hold a mutex.
 > 
 > I hope we're not trying to support such w/o volatile counter.

I think we have to: not just for POSIX, but for the Linux kernel too.

 > Whatever POSIX says, this would pessimize generic code too much.

We don't have to do it for non-threaded code.

Andrew.


Re: GCC 4.3 release schedule

2007-10-26 Thread Dennis Clarke
> On 10/26/07, Dennis Clarke <[EMAIL PROTECTED]> wrote:
>>On 10/26/07, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
>> >> Richard Guenther wrote:
>> >> > On 10/26/07, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
>> >> >>
>>
>> >
>> > ... when we think it's ready.  It doesn't help anyone to declare victory
>> > and release 4.3.0 when it still miscompiles the kernel (not that I know
>> > if it does).  Warm fuzzyness for PMs put aside.
>>
>> At the risk of annoying you Red Hat Linux guys ( and Linux people in
>> general
>> ) you may be surprised to hear that there are problems for UNIX(tm) users
>> out there.  Now I have tried and failed to get a successful bootstrap
>> build
>> of GCC 4.2.2 on Solaris 8 ( Sparc or x86 ) and on Solaris 10. When I look
>> at
>> the Build status page I see no one has posted a result there for GCC 4.2.2
>> :
>>
>>   Please see : http://gcc.gnu.org/gcc-4.2/buildstat.html
>>
>> I was able to get a decent build with GCC 4.2.1 however :
>>
>>  And see : http://gcc.gnu.org/ml/gcc-testresults/2007-08/msg00318.html
>>
>> Exact same machine with exact same environment can not build GCC 4.2.2.
>>
>> Now then, you seem to be discussing GCC 4.3 when GCC 4.2.x still does not
>> build correctly on a highly standards compliant UNIX platform.  Am I
>> reading this correctly ?
>
> Yes.  This is because the interest in 4.2.x is much lower than in 4.3.0
> right now.

  Everyone wants next years car models in the late fall. I understand that
only too well. I'm a tad more interested in quality results however as
opposed to a shiney new sports car where the brakes don't work.

  I hate car analogies .. but they often work. Sorry.

>> If not .. then please educate me if you can. I would like
>> to at least see GCC 4.2.2 bootstrap out of the box before flailing
>> forwards
>> to GCC 4.3.x.
>
> Patches welcome.  Certainly there are targets that are less maintained
> (and tested) than the *-linux targets.  But without infinite resources
> we cannot do anything about this.

  I can relate.  I really can.

> Thus, in the case of Solaris - talk to Sun.

I think that the rage inside Sun is Studio 12 and Studio 11 compilers which
are, within reason, vastly superior to GCC.  Then again we do not have the
source code ( yet ) but we do have the sources to OpenSolaris. One would
think that after some two years of the OpenSolaris project we would be able
to build the OS with GCC and some people have tried :

  http://www.opensolaris.org/os/project/gccfss-on/

I have not tried that myself.

If one uses the free Studio 12 compiler tools from Sun then you can build
the whole OS ( big chunks at least ) quite neatly.  I have done that so many
times it is just silly :

  http://www.blastwave.org/articles/BLS-0050/index.html

The Indiana project being headed up by Ian Murdock ( the ian in Debian )
will make big changes in the Solaris landscape and really we need to stop
thinking about Sun corporate and start looking to the extended OpenSolaris
community. Guys like me ( and Blastwave people in general ) are not on the
Sun payroll.  Conversely Red Hat has paid people in the GCC maillists.  I
have my own reasons to pour efforts into GCC and one of them is simply that
*everything* in the OS should be open sourced and *everything* that a person
uses to build it should be open sourced. I think that is the key concept
behind the FSF and the Linux community in general.  It is also why I pour my
heart and guts into Blastwave.org to ensure that people have access to open
source software in a classically/traditionally closed off proprietary
operating system.  I think that project Indiana will blow the doors off
those old concepts.

In any case .. I have my reasons to want a nice clean GCC 4.2.2 for
Solaris/OpenSolaris users and my open sourcey intentions are all within
reason.

Patches are always welcome indeed.

Dennis Clarke



RE: -fno-tree-cselim not working?

2007-10-26 Thread Andrew Haley
Dave Korn writes:
 > On 26 October 2007 17:11, Andrew Haley wrote:
 > 
 > >  >   Perhaps I've jumped into the wrong one of the two near-identical
 > >  > threads we have going on this, but my understanding of the original
 > >  > complaint was that gcc writes to the variable regardless of whether
 > >  > it needs an update or not, and that this is a problem in the case
 > >  > where one thread is accessing *without* using the mutex that the
 > >  > other one *is* using.
 > > 
 > > No, that is not the problem.
 > > 
 > > The problem is code like this:
 > > 
 > > int counter;
 > > 
 > > ...
 > > 
 > >   if (we_hold_counters_mutex)
 > > counter++;
 > > 
 > > This is legal POSIX threads code: counter is not accessed when we do
 > > not hold the mutex. 
 > 
 >   Well, that's the bone of contention.  I suggest that you cannot
 > infer "counter is not accessed" when the condition is false,
 > according to the C language spec, because there are two different
 > semantics of the word "accessed" here: as-if accessed, as per the
 > ideal machine definition of C, and actually accessed, as in what
 > actually happens in the underlying code.  The C standard only makes
 > guarantees about the as-if accesses.

That's right: the problem is that POSIX threads makes assumptions that
the C standard does not guarantee.  This is why I said previously "The
problem with your argument is that you're looking at ISO C but not at
POSIX threads."  POSIX threads requires more than just strict ISO
as-if semantics.

 > > According to POSIX we do not have to declare
 > > volatile memory that we only access when we hold a mutex.
 > 
 >   Is the problem that POSIX doesn't make the distinction between
 > as-if and actual behaviour that is such an essential part of the C
 > standard?

POSIX introduces extra requirements that are not part of the C
standard.  This is one of them.  Unfortunately it does so in a
more-or-less informal way.

 > > This introduces a data race that is not in the user's program.
 > 
 >   Do you mean that it's not in the as-if idealised version of the
 > program that the compiler constructs, or that it's not in the
 > high-level source?

The program actually requires not just Standard C but Standard C +
POSIX threads.  In other words, you cannot optimize POSIX threads
programs in the same way that you optimize single-threaded progarms.
This argument is the subject of Hans Boehm's paper "Threads cannot be
implemented as a library", referenced above.

Andrew.


Re: -fno-tree-cselim not working?

2007-10-26 Thread Andrew Haley
Ian Lance Taylor writes:
 > Andrew Haley <[EMAIL PROTECTED]> writes:
 > 
 > > The problem is code like this:
 > > 
 > > int counter;
 > > 
 > > ...
 > > 
 > >   if (we_hold_counters_mutex)
 > > counter++;
 > > 
 > > This is legal POSIX threads code: counter is not accessed when we do
 > > not hold the mutex.  According to POSIX we do not have to declare
 > > volatile memory that we only access when we hold a mutex.
 > 
 > Where does POSIX say that?

I think that's just the point: POSIX doesn't directly state it, but it
implies it.  The exact language used by POSIX is rather vague, and is
discussed at length in H. Boehm, ``Threads Cannot Be Implemented As a
Library'', http://www.hpl.hp.com/techreports/2004/HPL-2004-209.html.

This paper was quite enough to convince me.

Andrew.


Re: How widely used are and ?

2007-10-26 Thread Joe Buck

I wrote:
> > The thread arguing about this has gone on for a while, so I think
> > it's time to gather some data to answer the question of just how bad
> > it will be if we accept the decision to move ext/hash_set and ext_hash_map
> > into a different directory and to deprecate them.
> > 
> > Any of you out there who put out distros or port collections: how many 
> > packages in your distro use these classes?  You can grep for
> > 
> > '# *include * > 
> > I'm particularly interested in the "Debian number", since the package
> > collection is so large.  Other information about use of these classes
> > will be interesting as well.

On Thu, Oct 25, 2007 at 09:52:52PM -0700, Ian Lance Taylor wrote:
> Here is something to look at:
> 
> http://google.com/codesearch?q=%23include+%3Cext%2Fhash&btnG=Search&hl=en&lr=

Thanks, Ian (and thanks, Google).  There are about 400 hits, though I have
no idea how complete the Google codebase is compared to what's in a distro.

Thinks that will need fixing include inkscape, Boost, wxWindows, and
kdevelop, and at least 50 other programs, though it's hard to get a
good count, since the search is by file rather than by program.

I do see that monotone can use either the TR1 unordered containers or
, but that's the only one I saw.  Lots of programs are
designed to look either for  or for , but they
won't look for .  It also appears that some programs
are designed to fall back to  or  (therefore paying a
performance penalty) if the configure script does not find  or
.

Some programs try to isolate the details of where the headers are by
having one header with #ifdefs that, in turn, does the #include of
, but many others do not.



Re: Removal of pre-ISO C++ items from include/backwards

2007-10-26 Thread Joe Buck

On Thu, 25 Oct 2007, Joe Buck wrote:
> > Has anyone checked yet on the impact on a Debian distribution of
> > these proposed changes (and even for things that are checked in,
> > they should only be thought of as "proposed" at this point)?

On Fri, Oct 26, 2007 at 10:21:57AM +0200, Richard Guenther wrote:
> I re-built openSUSE with both changes and the ext/ stuff causes 62
> build failures, while the .h header API removal causes 21 build failures.

Thanks!  That's some good, solid data.




Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
On Fri, Oct 26, 2007 at 19:04:10 +0200, Michael Matz wrote:
> int f(int M, int *mc, int *mpp, int *tpmm, int *ip, int *tpim, int *dpp,
>   int *tpdm, int xmb, int *bp, int *ms)
> {
>   int k, sc;
>   for (k = 1; k <= M; k++)
> {
>   mc[k] = mpp[k-1]   + tpmm[k-1];
>   if ((sc = ip[k-1]  + tpim[k-1]) > mc[k])  mc[k] = sc;
>   if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;
>   if ((sc = xmb  + bp[k]) > mc[k])  mc[k] = sc;
>   mc[k] += ms[k];
> }
> }

Aha, but the store in this example is _never_ speculative when
concurrency in concerned: you _explicitly_ store to mc[k] anyway, so
you may as well add some stores here and there.  If mc[] shared, it's
programmer's responsibility to protect it with the lock.

When you remove the first and the last lines inside the loop, then all
stores will become conditional.  But only one value will get to mc[k],
so there's no point in making the only store unconditional.  Note that
it doesn't cancel cmoves, as those are loads, not stores.

But look at the whole matter another way: suppose GCC implements some
optimization, really cool one, and users quickly find a lot of uses
for it.  But then it is discovered that this optimization is not
general enough, and in come cases wrong code is produced.  What would
you do?  Remove it?  But users will complain.  Ignore the matter?
Other users will complain.  But you may make it optional, like
-funsafe-math-optimizations or -funsafe-loop-optimizations, and
everyone is happy.

Our situation is a bit different, because 1) speculative store is not
a bug per see, 2) program classes where it can do harm
(mutli-threaded), and where it can not (single-threaded), are clearly
separable.  Alright, not entirely, because we don't know when and how
libraries are used.  But that is the case for -funsafe- options above
too.  Want safe library?  Compile with
-fno-thread-unsafe-optimizations, or specify that any user data
pointers to which are passed to the library should not be shared (at
least during the library call).


> >   void
> >   f(int set_v, int *v)
> >   {
> > if (set_v)
> >   *v = 1;
> >   }
> > 
> > there's no load-maybe_update-store optimization, so there won't be
> > slowdown for such cases also (BTW, how this case is different from
> > when v is global?).
> 
> The difference is, that 'v' might be zero, hence *v could trap, hence it 
> can't be moved out of its control region.  If you somehow could determine 
> that *v can't trap (e.g. by having a dominating access to it already) then 
> the transformation will be done.

Good point.  But how to tell the compiler that it is not NULL?  The
following doesn't work too:

  void
  f(int set_v, int v[1])
  {
if (set_v)
  v[0] = 1;
  }


  void
  g(int set_v, int *v) __attribute__((nonnull));

  void
  g(int set_v, int *v)
  {
if (set_v)
  *v = 1;
  }


Please note that I'm not trying to prove you wrong, just curious about
the reasons why there's no optimization.


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
On Fri, Oct 26, 2007 at 21:45:03 +0400, Tomash Brechko wrote:
> Note that it doesn't cancel cmoves, as those are loads, not stores.

I just checked with x86 instruction reference, CMOVcc is reg->reg or
mem->reg, never reg->mem.  You know God's deed when you see it. :)


-- 
   Tomash Brechko


Re: RFC: Creating a live, all-encompassing architectural document for GCC

2007-10-26 Thread skaller

On Fri, 2007-10-26 at 19:34 +0200, Basile STARYNKEVITCH wrote:
> Diego Novillo wrote:

> I am more thinking loud than actually believing that it would be a good 
> idea to switch to literate programming; I have mixed feelings towards 
> this approach, which has been extensively used in the C-- compiler 
> http://cminusminus.org/ . My personal (biased) view is that 
> sophisticated compiler technology should be coded in higher level 
> languages than C, C++ or Java

Felix uses literate programming and a high level language (Ocaml).
Switching gcc to high level language will have more benefits and
be easier than using LP.

I like LP, but it is monolithic and invasive, and needs its
own tools (syntax colouring .. all gone :)

-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net


RE: -fno-tree-cselim not working?

2007-10-26 Thread skaller

On Fri, 2007-10-26 at 17:31 +0100, Dave Korn wrote:
> On 26 October 2007 17:11, Andrew Haley wrote:

>   Is the problem that POSIX doesn't make the distinction between as-if and
> actual behaviour that is such an essential part of the C standard?

The 'as-if' rule is present in ALL Standards of all kinds,
it is a consequence of the fact that a standard is a method
of judging whether something is non-conforming by performing
experiments on it, recording observations, and comparing
them with predictions.

Although the required behaviour is specified using an abstracted
model, only the requirements on observable properties have
any significance for the experiments: the virtual properties
are just semantic 'temporaries' used to make the predictions.

Optimisations use a different model from the one specified,
but the models are (hopefully!) congruent in the predictions
made for observables.


-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net


Re: -fno-tree-cselim not working?

2007-10-26 Thread Ian Lance Taylor
Andrew Haley <[EMAIL PROTECTED]> writes:

> The problem is code like this:
> 
> int counter;
> 
> ...
> 
>   if (we_hold_counters_mutex)
> counter++;
> 
> This is legal POSIX threads code: counter is not accessed when we do
> not hold the mutex.  According to POSIX we do not have to declare
> volatile memory that we only access when we hold a mutex.

Where does POSIX say that?

Ian


Re: -fno-tree-cselim not working?

2007-10-26 Thread Ian Lance Taylor
skaller <[EMAIL PROTECTED]> writes:

> On Fri, 2007-10-26 at 08:27 -0700, Ian Lance Taylor wrote:
> > skaller <[EMAIL PROTECTED]> writes:
> > 
> 
> > > I understand that's the common meaning .. but the semantics
> > > aren't specified for ISO C/C++.
> > 
> > Sure they are.  In the C99 standard look at the definition of sequence
> > points (5.1.2.3) and the definition of volatile (6.7.3).
> 
> 6.7.3 volatile bullet is non-normative. As pointed
> out in previous post: volatile accesses are observable.
> What 6.7.3 says is not a definition, but something which can
> be deduced from their status as observables.
> 
> The key point which makes this waffle is the bit that says
> 'what constitutes an access is implementation defined'.

OK, fair enough.

Of course, the gcc implementation does define it in the documentation.

Ian


Re: RFC: Creating a live, all-encompassing architectural document for GCC

2007-10-26 Thread Basile STARYNKEVITCH

Diego Novillo wrote:

This idea is still very raw in my mind, so apologies in advance for
being deliberately vague.  For the last few weeks I have been thinking
on ways to address the sorry state of our internal documentation.

We all agree that none of us has global knowledge of all aspects of
the compiler.  It's just impossible.  So, we have the collective
knowledge distributed all over the place but it is pretty hard for
someone to navigate the compiler without talking to N different
people.

So, I think the problem goes a bit beyond mere documentation of how a
module works at a high level.  I would like to have a navigable
document that also describes the flow of things, interfaces and
helpers.  Starting at main.c:main() and ending at toplev.c:finalize().



Maybe a possible approach would be to use literate programming 
techniques; By previous experience (still limited), I would believe that 
it would be more worthwhile on the interface files (ie the *.h files, 
some *.opt, etc...) than on the implementation files (*.c)


Still, it is a lot of work, and it would mean to change the coding 
guidelines & coding rules expected by GCC contributors.


I am more thinking loud than actually believing that it would be a good 
idea to switch to literate programming; I have mixed feelings towards 
this approach, which has been extensively used in the C-- compiler 
http://cminusminus.org/ . My personal (biased) view is that 
sophisticated compiler technology should be coded in higher level 
languages than C, C++ or Java


Maybe a better discipline would be to expect every coder to document his 
stuff on the GCC wiki.


All this is more a social issue than anything else. We developers don't 
like documenting our work (and this is sadly true for me too!)



(Diego, I am busy preparing the basilys branch, see my GCC summit paper).


--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: GCC 4.3 release schedule

2007-10-26 Thread Dennis Clarke

>> When I look at the Build status page I see no one has posted a result
>> there for GCC 4.2.2 :
>>
>>   Please see : http://gcc.gnu.org/gcc-4.2/buildstat.html
>
> Here are a couple of posts by Kaveh:
>   http://gcc.gnu.org/ml/gcc-testresults/2007-10/msg00388.html
>   http://gcc.gnu.org/ml/gcc-testresults/2007-10/msg00390.html
>
> They are more noisy than usual because of -fpic/-fPIC testing.
>

  Oooh .. thank you very much!

  I can handle the noise no problem and I am happy to see these results. Why
isn't the main page for build reports updated ? It *looks* like no one (
me too ) is getting clean builds.

Dennis



Re: GCC 4.3 release schedule

2007-10-26 Thread Eric Botcazou
> Why isn't the main page for build reports updated ?

Will do.

> It *looks* like no one ( me too ) is getting clean builds.

The GCC 4.2.x compiler is in pretty good shape on SPARC/Solaris, modulo the 
libgomp problems on Solaris 10 with the Sun tools.  You need to use the GNU 
tools if you care about OpenMP on Solaris 10.

-- 
Eric Botcazou


Re: GCC 4.3 release schedule

2007-10-26 Thread Dennis Clarke

>> Why isn't the main page for build reports updated ?
>
> Will do.
>
>> It *looks* like no one ( me too ) is getting clean builds.
>
> The GCC 4.2.x compiler is in pretty good shape on SPARC/Solaris, modulo the
> libgomp problems on Solaris 10 with the Sun tools.  You need to use the GNU
> tools if you care about OpenMP on Solaris 10.

I think the problem is that I am using Sun ONE Studio 8 on Solaris 8 to
build. My assumption here is that once GCC runs well on Solaris 8 Sparc v7
then it will run on any furture release of either Solaris or Sparc CPU
hardware. At least that is what the ABI says and it works.

The issue that I ran into was related to the stage 2 GCC compiler binary
getting CFLAGS options that were intended for the Studio 8 cc compiler. For
some obscure reason ( to me ) I am able to get a bootstrap of GCC 4.21. on
the same machine with that same environment vars in place but not with GCC
4.2.2.

I'll start digging .. again.

Dennis


Re: GCC 4.3 release schedule

2007-10-26 Thread Martin Michlmayr
* Richard Guenther <[EMAIL PROTECTED]> [2007-10-26 17:51]:
> Yes.  I think Ubuntu is on track for 4.3 as well, most likely Debian, too.

I've been testing 4.3 on a number of architectures Debian supports and
filings bugs.  There are still many that haven't been resolved yet.
Of course, gcc 4.3 also introduces build failures in about 550
packages (mostly due to the C++ header inclusion clean-up) and these
need to be fixed too.  However, Debian's next release is further away
than SUSE's and Fedora's so there should be enough time to fix these
issues.
-- 
Martin Michlmayr
http://www.cyrius.com/


Re: GCC 4.3 release schedule

2007-10-26 Thread Eric Botcazou
> When I look at the Build status page I see no one has posted a result
> there for GCC 4.2.2 :
>
>   Please see : http://gcc.gnu.org/gcc-4.2/buildstat.html

Here are a couple of posts by Kaveh:
  http://gcc.gnu.org/ml/gcc-testresults/2007-10/msg00388.html
  http://gcc.gnu.org/ml/gcc-testresults/2007-10/msg00390.html

They are more noisy than usual because of -fpic/-fPIC testing.

-- 
Eric Botcazou


Re: Removal of pre-ISO C++ items from include/backwards

2007-10-26 Thread Jonathan Wakely
On 26/10/2007, skaller <[EMAIL PROTECTED]> wrote:
> On Thu, 2007-10-25 at 22:56 +0100, Jonathan Wakely wrote:
> > The plan is to also move auto_ptr and the old bind1st/bind2nd function
> > binders to backward, if/when they are deprecated in C++0x, which would
> > give them the same status as  (deprecated in C++98)
>
> This would not be correct. When you deprecate C++2000 features,
> you should retain them in such a way that a compiler switch
> such as --std=C++2000 will ensure they're visible in the usual way.

So this doesn't go unchallenged and give people reading the archives
the wrong idea: that's exactly what Benjamin proposed. The link I
posted that you quoted in your reply said those features will only be
deprecated in C++0x mode. So in C++98 mode auto_ptr etc. will stay
exactly where they should be.

> The compiler is expected to conform to the specified standard
> and the standard libraries are an intrinsic part of the
> standard, and IMHO it would be good practice to allow
> 'strict' conformance to an older standard, whilst still
> rejecting 'never standardised' features.

Yes, that's the plan.  Noone has suggested dropping support for
C++98/C++03 nor deprecating anything from those standards except in
C++0x mode.

> Might not auto_ptr etc go into a distinct c++2000 directory?

I don't think the libstdc++ maintainers have decided exactly how the
include directories will be structured when C++0x is finished and
fully-supported.

Bear in mind there is no  header, so it's not as simple as
just moving individual headers.  But this is a topic for another
thread.

Jon


Re: GCC 4.3 release schedule

2007-10-26 Thread Joe Buck
On Fri, Oct 26, 2007 at 08:20:02PM +0200, Martin Michlmayr wrote:
> * Richard Guenther <[EMAIL PROTECTED]> [2007-10-26 17:51]:
> > Yes.  I think Ubuntu is on track for 4.3 as well, most likely Debian, too.
> 
> I've been testing 4.3 on a number of architectures Debian supports and
> filings bugs.  There are still many that haven't been resolved yet.
> Of course, gcc 4.3 also introduces build failures in about 550
> packages (mostly due to the C++ header inclusion clean-up) and these
> need to be fixed too.  However, Debian's next release is further away
> than SUSE's and Fedora's so there should be enough time to fix these
> issues.

You might want to hold off on investing the work in fixing those 550
packages, because I think it's premature to consider the header "cleanup"
final.

Can you estimate how many of the broken packages use  or
?




Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Tomash Brechko
On Sat, Oct 27, 2007 at 03:06:21 +1000, skaller wrote:
> err .. what about the heap??

The heap are objects for which the addresses were taken.  So they can
be shared.  But I haven't yet seen that the optimization we discuss is
being applied to the object accessed though the pointer (see my reply
to Michael Matz).  Maybe this is just a coincidence.

I was beaten already for repeating myself, but please let me do that
once more :).  First, I have a strong believe (though I didn't test
it) that

  if (C)
val->mem;

runs faster than

  mem->reg;
  if (C)
val->reg;
  reg->mem;

(short) jump will cost less then unconditional load/store when they
are not needed (especially the store).

BTW, it would be interesting to measure if short jumps are as bad as
long jumps, i.e. whether CPU pipeline is flushed when jump target is
already in it.


Second, in situation like

  loop
if (C)
  val->mem;

i.e. when there are lots of conditional stores, only one final store
matters.  And current optimization employs this:

  mem->reg;
  loop
if (C)
  val->reg;
  reg->mem;// One final store.

But at the cost of additional register this final store can be made
conditional (there are cases when even that register is not needed,
but that requires thorough analysis of val's possible values, i.e. reg
could be initialized to some "invalid" value and then checked for it).

Registers are a valuable resource, yes.  But so is the correct program
result.  Since GCC is correct wrt all standards, next comes its
usability in not-yet-standardized domains.


> And what do you do if you do not KNOW what the storage class is,
> which is the case 99.99% of the time in C++ member functions?

I'm not quite sure what you mean here.  If extern vs static---that's
of no concern.  What matters is whether the object can possibly be
accessed from another thread, and this has nothing specific to C++.



-- 
   Tomash Brechko


Re: RFC: Creating a live, all-encompassing architectural document for GCC

2007-10-26 Thread Diego Novillo
On 10/26/07, Basile STARYNKEVITCH <[EMAIL PROTECTED]> wrote:

> Maybe a possible approach would be to use literate programming
> techniques; By previous experience (still limited), I would believe that
> it would be more worthwhile on the interface files (ie the *.h files,
> some *.opt, etc...) than on the implementation files (*.c)

Pragmatically, we need to do this without forcing such structural
changes to the implementation.  Moving GCC to another language is
another problem that I'd like to keep separate from this.


> All this is more a social issue than anything else. We developers don't
> like documenting our work (and this is sadly true for me too!)

Agreed.  The documentation mechanism ultimately needs to be easy to
improve.  It will always be incomplete and imperfect.  I simply want
to make it less so.

> (Diego, I am busy preparing the basilys branch, see my GCC summit paper).

Good to hear.  Thanks.


Re: RTL/VCG inconsistency (the check_match.7758 case)

2007-10-26 Thread Diego Novillo
On 10/24/07, Sunzir Deepur <[EMAIL PROTECTED]> wrote:

> Any idea on why this inconsistency happen and how to solve it
> (probably the VCG dumper should somehow use "check_match.7758" as the base
> of the function name and not "check_match") ?

Feel free to offer patches to fix this inconsistency.  The VCG dumper
is not heavily used, so you won't find many folks interested in it.


Re: -fno-tree-cselim not working?

2007-10-26 Thread Andi Kleen
Ian Lance Taylor <[EMAIL PROTECTED]> writes:
>
> This code isn't going to be a problem, because spin_unlock presumably
> includes a memory barrier.

At least in the Linux kernel and also in glibc for mutexes locks are just plain
function calls, which are not necessarily full memory barriers.

-Andi



Re: -fno-tree-cselim not working?

2007-10-26 Thread Ian Lance Taylor
Andi Kleen <[EMAIL PROTECTED]> writes:

> Ian Lance Taylor <[EMAIL PROTECTED]> writes:
> >
> > This code isn't going to be a problem, because spin_unlock presumably
> > includes a memory barrier.
> 
> At least in the Linux kernel and also in glibc for mutexes locks are just 
> plain
> function calls, which are not necessarily full memory barriers.

True, and problematic in some cases--but a function call which gcc
can't see is a memory barrier for all addressable memory.

Ian


Re: -fno-tree-cselim not working?

2007-10-26 Thread Andi Kleen
"Richard Guenther" <[EMAIL PROTECTED]> writes:
>
> I hope we're not trying to support such w/o volatile counter.  Whatever
> POSIX says, this would pessimize generic code too much.

It is dubious this transformation is an optimization at all for memory. 

e.g. consider the case counter is not in cache.

You'll add an cache miss which will be 2-3 degrees of magnitude 
more costly than what you can safe by not jumping.  Full cache misses are so 
expensive that even when they happen rarely they still hurt a lot.

There might be a case for doing it on memory when you can pretty much
guarantee the variable is in L1 (e.g. it is in the stack frame and
you only got a very small stack frame) or only in a register. 

But for other cases it's likely better to not do it at all.

BTW there is a cache friendly (and incidentially thread-safe) alternative way 
to eliminate the jump transformation when the CPU has CMOV available. 

You can use

int dummy;  // on stack, likely in L1
int *ptr;

ptr = &dummy;
if (cond)   // can be implemented jumpless using CMOV 
   ptr = &counter;  
*ptr++;

This will take more registers though.

-Andi


RE: Creating a live, all-encompassing architectural document for GCC

2007-10-26 Thread Eric Weddington
 

> -Original Message-
> From: Diego Novillo [mailto:[EMAIL PROTECTED] 
> Sent: Friday, October 26, 2007 11:10 AM
> To: gcc@gcc.gnu.org
> Subject: RFC: Creating a live, all-encompassing architectural 
> document for GCC
> 
> 
> It should be easy for an individual maintainer (or even user) to go in
> and modify parts of the document that are incomplete/missing/wrong.
> This and navigability suggest a wikipedia-like approach.  We even have
> the beginnings of some of this in the wiki, so I would like to build
> on that.
...
> However, if
> a patch refactors a module and its internal interfaces are changed,
> then the patch should be accompanied with a change to the
> documentation.
... 
> The documentation for individual modules and files should be linked to
> the actual source code.  Perhaps this could be automatically generated
> with tools like javadoc or doxygen.
...
> So, I think my inclination is to provide this document as a wiki.

I like the goals. But what I see above seems mutually exclusive.

It's reasonable to include doxygen, and change the code with the
documentation simultaneously. Here's the avr-libc user manual online as an
example of the output:

It certainly meets the navigable requirement. I'm sure all of you have seen
other examples as well.

But you also want the user to be able to change an internals document via a
wiki? How does this work with a patch system? How do you propose to resolve
conflicts between a user edit, and maintainer's patch? Maybe I'm ignorant to
the capabilities of a wiki, but this is where it sounds like incompatible
systems.

Eric Weddington



Re: problem with iv folding

2007-10-26 Thread DJ Delorie

> you cannot add two pointers.  Please create a PR for this and assign it
> to me.

Done, pr 33915

Note that m32c-elf needs --with-newlib

Thanks!


Re: GCC 4.3 release schedule

2007-10-26 Thread Martin Michlmayr
* Joe Buck <[EMAIL PROTECTED]> [2007-10-26 11:44]:
> You might want to hold off on investing the work in fixing those 550
> packages, because I think it's premature to consider the header
> "cleanup" final.
> 
> Can you estimate how many of the broken packages use 
> or ?

Sorry I wasn't being clear.  This is without the recent removal of
some backward-compability headers.  The errors I'm talking about are
due to the fix for PR28080 and similar and those changes will be in
4.3.
-- 
Martin Michlmayr
http://www.cyrius.com/


Re: GCC 4.3 release schedule

2007-10-26 Thread Janis Johnson
On Fri, 2007-10-26 at 19:54 +0200, Eric Botcazou wrote:
> > When I look at the Build status page I see no one has posted a result
> > there for GCC 4.2.2 :
> >
> >   Please see : http://gcc.gnu.org/gcc-4.2/buildstat.html
> 
> Here are a couple of posts by Kaveh:
>   http://gcc.gnu.org/ml/gcc-testresults/2007-10/msg00388.html
>   http://gcc.gnu.org/ml/gcc-testresults/2007-10/msg00390.html
> 
> They are more noisy than usual because of -fpic/-fPIC testing.

I added more entries to gcc-4.2/buildstat.html.

Bootstrap and test results for 4.2.2:

  i686-pc-linux-gnu (Slackware 12.0, kernel 2.6.22, glibc 2.5)

Test results for 4.2.2:

  hppa2.0w-hp-hpux11.11
  hppa64-hp-hpux11.11
  hppa-unknown-linux-gnu
  i386-unknown-freebsd5.5
  i686-pc-linux-gnu (2)
  ia64-unknown-linux-gnu
  s390-ibm-linux-gnu
  s390x-ibm-linux-gnu
  sparc-sun-solaris2.8
  sparc64-unknown-linux-gnu
  x86_64-unknown-linux-gnu (2)

Sorry for the delay.

Janis



Re: GCC 4.3 release schedule

2007-10-26 Thread Eric Botcazou
> I added more entries to gcc-4.2/buildstat.html.
>
> Bootstrap and test results for 4.2.2:
>
>   i686-pc-linux-gnu (Slackware 12.0, kernel 2.6.22, glibc 2.5)
>
> Test results for 4.2.2:
>
>   hppa2.0w-hp-hpux11.11
>   hppa64-hp-hpux11.11
>   hppa-unknown-linux-gnu
>   i386-unknown-freebsd5.5
>   i686-pc-linux-gnu (2)
>   ia64-unknown-linux-gnu
>   s390-ibm-linux-gnu
>   s390x-ibm-linux-gnu
>   sparc-sun-solaris2.8
>   sparc64-unknown-linux-gnu
>   x86_64-unknown-linux-gnu (2)

Thanks!

-- 
Eric Botcazou


Re: GCC 4.3 release schedule

2007-10-26 Thread Andrew MacLeod

Mark Mitchell wrote:

Andrew MacLeod wrote:

  

we can at least make projected dates known so we have something firmer
than "at some point in the future" :-)



As RM, I try to take into account what I know about when distributors
will be applying effort, but I must absolutely avoid in any way tilting
the FSF release process towards the needs of one distributor, possibly
at the expense of another.  I don't think it's appropriate for us to set
a schedule tailored to any one distributor's needs -- and there are a
lot more distributors than just Red Hat and SuSE, so I'd say that even
if you were on the same schedule.  But, I certainly do think it's
helpful for a contributor to tell us when resources might be available
and I appreciate you sharing that information.

  


I'm not suggesting we tailor the schedule to a specific distributor, but 
I do think when we have useful information that a client of GCC will be 
choosing a release by $date, it might be worth considering how that fits 
into the current or future release schedules.  Fortunately we seem to 
have an alignment of the planets at the moment, so it doesn't appear it 
will be much of an issue for this release. we got lucky. 12 months ago, 
it might have actually been planned instead of luck  had fedora and suse 
said our plans are to be looking for a compiler in early Q3/2007 and 
then again in Q1/2008. And if other distributions provided approximate 
dates, we'd see where "ooo, look, 5 distributions will be looking for a 
compiler in Q1/2008.  perhaps we should try to arrange our schedule to 
have a release available then.  That means stage 1 goes to june, stage 2 
goes to mid september, and that should result in a release in late 
Q4/early Q1 which all those distributions will be interested in".  I 
think we'd see a lot of resource pumped into getting that release out.   
If the schedule had shown not more than 1 distribution was looking for a 
release until June of next year, perhaps we decide to delay things 
further to allow more development.


One of the reasons we sometimes languish in stage 3 is because a release 
is not interesting to enough parties. They end up spending their 
resource working on a future release which will be of interest to them, 
and stage 3 drags on.   I think our best bet at making stage 3 practical 
and short is to have enough interest in getting the release out.   And 
the reality of the situation is that you probably need a couple of 
distributions interested in it. It looks like thats the case with this 
release, and I am hopeful that we are going to have a reasonable stage 3 
and get a release out in a timely fashion. In the future, we can 
probably accomplish the same thing if we try to align a release date 
with interested parties.




If you're interested in driving the release to a particular date, the
best thing you can do is to go clear out the P1s in bugzilla and then
bash out a few P2s.  (I've noticed Red Hat folks doing some of that
already, thanks!)  I'd imagine that the dates you want to hit would be
achievable if you, Jakub, Jason, etc. all work on issues.

  
We do intend to do that, but its easier to get the resource assignments 
if we can say "gcc 4.3 is currently planned to be released on Feb 8th, 
but we need 25% of 3 developers for 3 months to help make sure".  It 
boils down to the same thing to you and me, but not to the other 
projects and people involved.  If we can set a trend like this, and we 
meet a couple of release dates accurately, we might be able to regularly 
get resource assigned to releases.




I've found schedules for GCC to be very hard to predict.  As I said in
my status report, our practice has been to cut the release branch when
we reach 100 regressions, and release 2-4 months after that point,
depending on quality on the branch.  To be honest, I'd rather wait
longer to make the branch -- but there tends to be intense pressure in
the developer community to make a branch so we can get on to the next
round of major features.  In any case, after we make the branch, it's in
regression-only mode, so stability tends to be quite good, though
dot-zero releases are, after all, dot-zero releases.
  


Yeah, I'm not so concerned about whether we cut a release branch early 
or not. Cutting a branch, or saying mainline is regression only, or 
whatever mechanism boils down to the same thing. The key is to get 
people working on it because the release is needed.   If the release is 
needed, its likely to be needed by a certain date. In the past our dates 
have been set fairly arbitrarily by ourselves right?  If our date 
coincides with dates that others actually need the compiler by, I bet we 
see a lot less slippage.  I just suggest we try it, it really not that 
big a change in my view, and I think it may solve come of our problem in 
getting focus on releases.  I have no doubt that if we set a date in 
early february, we'll probably make it.  And I'd like to see if we can 
reprodu

Re: Creating a live, all-encompassing architectural document for GCC

2007-10-26 Thread Diego Novillo
On 10/26/07, Eric Weddington <[EMAIL PROTECTED]> wrote:


> 
> It certainly meets the navigable requirement. I'm sure all of you have seen
> other examples as well.

Thanks.  I'll take a look.

>
> But you also want the user to be able to change an internals document via a
> wiki? How does this work with a patch system? How do you propose to resolve
> conflicts between a user edit, and maintainer's patch? Maybe I'm ignorant to
> the capabilities of a wiki, but this is where it sounds like incompatible
> systems.

That's just one option.  The impedance between wiki and the source
tree is that the wiki is not patch-based, which makes it more
accessible but it also increases chances that the code and the
documentation will diverge.

I maintain that divergence is inevitable, regardless of whether the
code and the documentation live together or not (without getting into
Literate Programming arguments).  So, we need to make it easy to keep
them both in sync.

Moving to a pure document extraction system like doxygen or javadoc
may be useful, but the problem there is that we miss all the
whole-system documentation, API interfaces, behaviours, etc.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Florian Weimer
* Andrew Haley:

> The core problem here seems to be that the "C with threads" memory
> model isn't sufficiently well-defined to make a determination
> possible.  You're assuming that you have no resposibility to mark
> shared memory protected by a mutex as volatile, but I know of nothing
> in the C standard that makes such a guarantee.  A prudent programmer
> will make conservative assumptions.

Sprinkling volatile all over the place is looks like the wrong answer.
It disables many optimizations, so you could probably use a simpler
compiler which doesn't perform the problematic optimizations in the
first place.

Not creating spurious stores seems to be a saner approach.  Hans Boehm's
concerns still apply, of course, but with knowledge of the architecture
and GCC's existing support of optimization barriers, programmers
probably have enough control to produce what they need.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Daniel Jacobowitz
On Fri, Oct 26, 2007 at 02:24:21PM -0700, Ian Lance Taylor wrote:
> What do people think of this patch?  This seems to fix the problem
> case without breaking Michael's case.  It basically avoids store
> speculation: we don't write to a MEM unless the function
> unconditionally writes to the MEM anyhow.
> 
> This is basically a public relations exercise.  I doubt this
> optimization is especially important, so I think it's OK to disable it
> to keep people happy.  Even though the optimization has been there
> since gcc 3.4 and nobody noticed.
> 
> Of course this kind of thing will break again until somebody takes the
> time to fully implement something like the C++0x memory model.

Right.  In fact it seems to me to be still broken; you just need a
bigger test case.

  if (trylock)
{ var++; unlock; }

  sleep

  lock
  var++;
  unlock

I'm sure someone can turn that into a sensible looking example, with a
little inlining.

-- 
Daniel Jacobowitz
CodeSourcery


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Diego Novillo
On 26 Oct 2007 14:24:21 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

> What do people think of this patch?  This seems to fix the problem
> case without breaking Michael's case.  It basically avoids store
> speculation: we don't write to a MEM unless the function
> unconditionally writes to the MEM anyhow.

I think it couldn't hurt.  Providing it as a QOI feature might be
good.  However, we should predicate these changes on a -fthread-safe
flag.  More and more of these corner cases will start popping up.


Re: -fno-tree-cselim not working?

2007-10-26 Thread Ian Lance Taylor
Andrew Haley <[EMAIL PROTECTED]> writes:

> Ian Lance Taylor writes:
> 
>  > As I understand it, the draft C++0x memory model has acquire release
>  > semantics for annotated variables.  Of course, it wouldn't help the
>  > originalk test case unless the global variable was annotated.
> 
> Mmm, but one of the authors of the draft C++0x memory model tells me
> that the controversial optimization gcc is performing is definitely
> illegal under that model, regardless of how the variables are
> annotated.  I haven't yet got deep enough into the working paper to be
> able to point you at exactly where it says so.

I was wrong, and you are right.

(I just checked with our local C++ standards representative, who also
done a great deal of work on the memory model.)

Ian


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Ian Lance Taylor
Michael Matz <[EMAIL PROTECTED]> writes:

> Both, the assessment of far-stretchedness and these numbers seem to be 
> invented ad hoc.  The latter is irrelevant (it's not interesting how many 
> cases there are, but how important those cases which occur are, for some 
> metric, let's say performance).  And the former isn't true, i.e. the 
> concern is not far-stretched.  For 456.hmmer for instance it is crucial 
> that this transformation happens, the basic situation looks like so:

What do people think of this patch?  This seems to fix the problem
case without breaking Michael's case.  It basically avoids store
speculation: we don't write to a MEM unless the function
unconditionally writes to the MEM anyhow.

This is basically a public relations exercise.  I doubt this
optimization is especially important, so I think it's OK to disable it
to keep people happy.  Even though the optimization has been there
since gcc 3.4 and nobody noticed.

Of course this kind of thing will break again until somebody takes the
time to fully implement something like the C++0x memory model.

I haven't tested this patch.

Ian

Index: ifcvt.c
===
--- ifcvt.c (revision 128958)
+++ ifcvt.c (working copy)
@@ -2139,6 +2139,32 @@ noce_mem_write_may_trap_or_fault_p (cons
   return false;
 }
 
+/* Return whether a MEM is unconditionally set in the function
+   following TOP_BB.  */
+
+static bool
+noce_mem_unconditionally_set_p (basic_block top_bb, const_rtx mem)
+{
+  basic_block dominator;
+
+  for (dominator = get_immediate_dominator (CDI_POST_DOMINATORS, top_bb);
+   dominator != NULL;
+   dominator = get_immediate_dominator (CDI_POST_DOMINATORS, dominator))
+{
+  rtx insn;
+
+  FOR_BB_INSNS (dominator, insn)
+   {
+ if (memory_modified_in_insn_p (mem, insn))
+   return true;
+ if (modified_in_p (XEXP (mem, 0), insn))
+   return false;
+   }
+}
+
+  return false;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
it without using conditional execution.  Return TRUE if we were successful
at converting the block.  */
@@ -2292,17 +2318,31 @@ noce_process_if_block (struct noce_if_in
   goto success;
 }
 
-  /* Disallow the "if (...) x = a;" form (with an implicit "else x = x;")
- for optimizations if writing to x may trap or fault, i.e. it's a memory
- other than a static var or a stack slot, is misaligned on strict
- aligned machines or is read-only.
- If x is a read-only memory, then the program is valid only if we
- avoid the store into it.  If there are stores on both the THEN and
- ELSE arms, then we can go ahead with the conversion; either the
- program is broken, or the condition is always false such that the
- other memory is selected.  */
-  if (!set_b && MEM_P (orig_x) && noce_mem_write_may_trap_or_fault_p (orig_x))
-return FALSE;
+  if (!set_b && MEM_P (orig_x))
+{
+  /* Disallow the "if (...) x = a;" form (implicit "else x = x;")
+for optimizations if writing to x may trap or fault,
+i.e. it's a memory other than a static var or a stack slot,
+is misaligned on strict aligned machines or is read-only.  If
+x is a read-only memory, then the program is valid only if we
+avoid the store into it.  If there are stores on both the
+THEN and ELSE arms, then we can go ahead with the conversion;
+either the program is broken, or the condition is always
+false such that the other memory is selected.  */
+  if (noce_mem_write_may_trap_or_fault_p (orig_x))
+   return FALSE;
+
+  /* Avoid store speculation: given "if (...) x = a" where x is a
+MEM, we only want to do the store if x is always set
+somewhere in the function.  This avoids cases like
+  if (pthread_mutex_trylock(mutex))
+++global_variable;
+where we only want global_variable to be changed if the mutex
+is held.  FIXME: This should ideally be expressed directly in
+RTL somehow.  */
+  if (!noce_mem_unconditionally_set_p (test_bb, orig_x))
+   return FALSE;
+}
 
   if (noce_try_move (if_info))
 goto success;
@@ -3957,7 +3997,7 @@ dead_or_predicable (basic_block test_bb,
 /* Main entry point for all if-conversion.  */
 
 static void
-if_convert (bool recompute_dominance)
+if_convert (void)
 {
   basic_block bb;
   int pass;
@@ -3977,9 +4017,8 @@ if_convert (bool recompute_dominance)
   loop_optimizer_finalize ();
   free_dominance_info (CDI_DOMINATORS);
 
-  /* Compute postdominators if we think we'll use them.  */
-  if (HAVE_conditional_execution || recompute_dominance)
-calculate_dominance_info (CDI_POST_DOMINATORS);
+  /* Compute postdominators.  */
+  calculate_dominance_info (CDI_POST_DOMINATORS);
 
   df_set_flags (DF_LR_RUN_DCE);
 
@@ -4068,7 +4107,7 @@ rest_of_handle_if_conversion (void)
  

Re: How widely used are and ?

2007-10-26 Thread Marcus Meissner
On Thu, Oct 25, 2007 at 09:40:06PM -0700, Joe Buck wrote:
> The thread arguing about this has gone on for a while, so I think
> it's time to gather some data to answer the question of just how bad
> it will be if we accept the decision to move ext/hash_set and ext_hash_map
> into a different directory and to deprecate them.
> 
> Any of you out there who put out distros or port collections: how many 
> packages in your distro use these classes?  You can grep for
> 
> '# *include * 
> I'm particularly interested in the "Debian number", since the package
> collection is so large.  Other information about use of these classes
> will be interesting as well.

For SUSE the ones the grep above found:
- kdevelop3
- amarok
- xmms-kde
- apt
- abiword
- pan
- scim
- kseg
- pdns
Mostly one or two occurences.

Ciao, Marcus


Re: GCC 4.3 release schedule

2007-10-26 Thread Joe Buck
On Fri, Oct 26, 2007 at 10:28:35PM +0200, Martin Michlmayr wrote:
> * Joe Buck <[EMAIL PROTECTED]> [2007-10-26 11:44]:
> > You might want to hold off on investing the work in fixing those 550
> > packages, because I think it's premature to consider the header
> > "cleanup" final.
> > 
> > Can you estimate how many of the broken packages use 
> > or ?
> 
> Sorry I wasn't being clear.  This is without the recent removal of
> some backward-compability headers.  The errors I'm talking about are
> due to the fix for PR28080 and similar and those changes will be in
> 4.3.

OK.  Can you estimate how many packages use  or
?



Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Ian Lance Taylor
"Diego Novillo" <[EMAIL PROTECTED]> writes:

> On 26 Oct 2007 14:24:21 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
> 
> > What do people think of this patch?  This seems to fix the problem
> > case without breaking Michael's case.  It basically avoids store
> > speculation: we don't write to a MEM unless the function
> > unconditionally writes to the MEM anyhow.
> 
> I think it couldn't hurt.  Providing it as a QOI feature might be
> good.  However, we should predicate these changes on a -fthread-safe
> flag.  More and more of these corner cases will start popping up.

It appears that the draft C++0x memory model prohibits speculative
stores.

Therefore I now think we should aim toward prohibiting them
unconditionally.  That memory model is just a draft.  But I think we
should implement it unconditionally when it exists.

Ian


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Jakub Jelinek
On Fri, Oct 26, 2007 at 02:24:21PM -0700, Ian Lance Taylor wrote:
> What do people think of this patch?  This seems to fix the problem
> case without breaking Michael's case.  It basically avoids store
> speculation: we don't write to a MEM unless the function
> unconditionally writes to the MEM anyhow.

This still isn't enough.  If you have a non-pure/non-const CALL_INSN
before the unconditional store into it, you need to return false from
noce_mem_unconditionally_set_p as that function could have a barrier
in it.  Similarly for inline asm or __sync_* builtin generated insns
(not sure ATM if just stopping on UNSPEC_VOLATILE/ASM_INPUT/ASM_OPERANDS
or something else is needed).

Jakub


  1   2   >