Fwd: Re: [PATCH][4.3] Deprecate -ftrapv

2008-03-04 Thread amylaar

Somehow this got stuck in the spam filter.

- Forwarded message from [EMAIL PROTECTED] -
Date: Sat, 01 Mar 2008 09:21:21 -0500
From: Joern Rennecke <[EMAIL PROTECTED]>
Reply-To: Joern Rennecke <[EMAIL PROTECTED]>
 Subject: Re: [PATCH][4.3] Deprecate -ftrapv
  To: gcc@gcc.gnu.org
  Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]

On Fri, 29 Feb 2008, Robert Dewar wrote:

Well presumably one would want to use target dependent stuff for
detecting overflow where it exists (sticky overflow bits on
power, O flag on PC, trapping add on MIPS etc).


In fact, when I wrote the original -ftrapv code, it was for the sole purpose
of using the trapping add on mips.

On Sat, 1 Mar 2008, Joseph S. Myers wrote:

The only targets defining the v insn patterns at present
appear to be alpha and pa.


Considering the trouble that you get when you try to generate branches in
a non-branch expander, we should probably have alternate named patterns
to be used in ports to processors that have no conditional trap facility,
or where a conditional trap is more expensive than a well predictable
conditional branch.
We want arithmetic-and-branch-on-overflow patterns for these.
One peculiarity of these patterns would be that they would be required
to expand into more than one instruction, since the write of the result
must not be in the same instruction as the branch due to reload limitations.
Thus the overflow condition in CC0 / other flags register / predicate
register has to be actually exposed in rtl to show the dependency between
arithmetic and branch.
We should document this quirk in the description of these named patterns.

When the machine independent expander machinery wants to expand a
trapping arithmetic operation that has no matching named pattern defined
by the port, and there is no conditional trap defined, it can than use
the arithmetic-and-branch-on-overflow pattern to branch to an abort call
if an overflow occurs.

To allow branch inversion to work, we don't need to do anything special
if the condition is expressed as a comparison against 0 of a 'integer'
flag regsiter or a predicate bit.  However, if the condition is in CC0
or a CCmode flags register, we want a way to express the overflow
and non-overflow conditions so that reverse_condition or REVERSE_CONDITION
can do its work.

I see two possibilities here.  For simplicity I will describe them
here in terms of CC0, although many target ports would actually use a
scheduler-exposed flags register with an appropriate CCmode mode.
- We could have (overflow CC0 0) and (nooverflow CC0 0), where
   overflow and nooverflow are two new comparison codes, and the trailing
   0 is a dummy argument for the sake of consistency with comparison
   operators.
- We could have (ge CC0 overflow) and (lt CC0 overflow), where overflow
   is a new one-of-a-kind RTX object.


- End forwarded message -


Re: Benchmarks: 7z, bzip2 & gzip.

2008-03-04 Thread Martin Guy
2008/2/29, J.C. Pizarro <[EMAIL PROTECTED]>:
> Here are the results of benchmarks of 3 compressors: 7z, bzip2 and gzip, and
>  GCCs 3.4.6, 4.1.3-20080225, 4.2.4-20080227, 4.3.0-20080228 & 4.4.0-20080222.

Thanks, that's very interesting. I had noticed 4.2 producing 10%
larger and 10% slower code for a sample code fragment for ARM but
couldn't follow it up.

Is there a clause in regressions for "takes longer to compile and
produces worse code"?

M


Re: GCC 4.3.0 Status Report (2008-03-03)

2008-03-04 Thread Richard Guenther
On Mon, 3 Mar 2008, H.J. Lu wrote:

> Hi,
> 
> I'd like to fix
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35453
> 
> for gcc 4.3. Defines SIDD_XXX in SSE4 header file is a bad idea. SSE 4
> header file
> in icc will also be fixed.

Works for me.

Richard.


Getting GCC to always dllimport vtables on X86?

2008-03-04 Thread Reuben Harris
Hi,


Sure hope I've come to the right place...

I need to somehow persuade GCC (on x86) to always treat vtables as if
they were dllimport'ed. For linking to work on my target platform (a
custom X86 OS) it's important that constructors reference vtables
indirectly (i.e. through pointers in idata). The other side of this
coin is a small hack to ld to allow dllimports to work, not just from
importing modules, but from the exporting module as well (i.e. __imp__
symbols for vtables get created automatically once ld detects its
unresolved).

Is there anyone here who would be willing to show me the way with
this? Although I am a proficient C/C++ programmer, I am a GNU noob and
the GCC source code scares me... :-)  I think I've found the right
place in GCC - import_export_vtable in decl2.c - but am at a loss to
understand the ld source.

Many thanks,


-- Reuben Harris


Re: atomic accesses

2008-03-04 Thread Robert Dewar

Segher Boessenkool wrote:
The Linux kernel, and probably some user-space applications and 
libraries

as well, depend on GCC guaranteeing (a variant of) the following:
"any access to a naturally aligned scalar object in memory
that is not a bit-field will be performed by a single machine
instruction whenever possible"
and it seems the current compiler actually does work like this.

Seems a pity to have the bit-field exception here, why is it there?

Bit-fields will generally require a read-modify-write instruction,
and I don't think we actually guarantee to generate one right now.

Well if they do require more than one instruction, the rule has
no effect ("whenever possible"). If they can be done in one
instruction  (as on the x86), then why not require this, why
make a special case?


Because current GCC doesn't work like this AFAIK.  I'm aiming for
a documentation-only change here, we can always extend it later.


Fair enough, we don't want to document something we don't do!

Does this rule extend to the use of floating-point instructions
to guarantee atomic access to 64-bit long_long_integer, as
written it does!



Segher




Information regarding issue with While Loop with O3 optimization

2008-03-04 Thread Raghukrishna Hegde
Hello all,
I am encountering a strange problem. I have a code
Snippet that contains a while loop.

The snippet is as follows:

 While( (expr1) && (expr2) );

Initially the value of both expr1 and expr2 are
Set to 1.

Next, only the value of expr1 is set to 0 within a 
SIGINT handler.

I compile this program with -O3 optimization. The gcc
Version information is as follows:
gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52)

When I run this program, it goes into a tight loop.
Now when I send a SIGINT to this program,
Then the program is not exiting.

I repeat the same procedure by compiling the same
code with -O0 optimization option. Now when I
Send the SIGINT signal the program exits

Has anyone encountered similar kind of behavior?
Is there any issue with GCC 4.1.1 related to O3 optimization
Option.?

Regards
Raghu.



Re: Information regarding issue with While Loop with O3 optimization

2008-03-04 Thread Robert Dewar

Raghukrishna Hegde wrote:

Hello all,
I am encountering a strange problem. I have a code
Snippet that contains a while loop.

The snippet is as follows:

 While( (expr1) && (expr2) );

Initially the value of both expr1 and expr2 are
Set to 1.

Next, only the value of expr1 is set to 0 within a 
SIGINT handler.


you need to make expr1 volatile for this to work, the
optimizer does not have to take care of the possibility
of a handler changing a variable otherwise.




Re: Interoperability of Fortran array and C vector?

2008-03-04 Thread FX Coudert

But the remaining question is: can we
support type introperability from Fortran array to C vector?


I think this is more a middle-end issue that a Fortran issue, so I'm  
following there: can the middle-end VIEW_CONVERT_EXPR between and  
ARRAY_REF of, say, INTEGER_TYPE (which is what the Fortran array is)  
and a VECTOR_TYPE?


I'm CCing the main GCC devel list, since we might have more answers  
form there.


FX

--
François-Xavier Coudert
http://www.homepages.ucl.ac.uk/~uccafco/



Re: Swing replacements

2008-03-04 Thread Joel Dice

On Mon, 3 Mar 2008, [EMAIL PROTECTED] wrote:


I have a stand-alone, non-Web-based app. that I'd like to
distribute as a .exe with some database files, to a layman
audience, and I'd like to avoid issues of JRE distribution and
compatibility, etc. So I'm hoping someone, somewhere, has
written a replacement framework for Java's GUI classes. Can
you by any chance point me in such a direction?


I haven't used it myself, but SwingWT (http://swingwt.sourceforge.net/) in 
combination with SWT might be be what you're looking for.  See also 
(http://thisiscool.com/gcc_mingw.htm).  BTW, [EMAIL PROTECTED] is a more 
appropriate list for further discussion, at least concerning GCJ.


Re: atomic accesses

2008-03-04 Thread Jakub Jelinek
On Mon, Mar 03, 2008 at 11:08:24PM -0500, Robert Dewar wrote:
> Segher Boessenkool wrote:
> >>>The Linux kernel, and probably some user-space applications and 
> >>>libraries
> >>>as well, depend on GCC guaranteeing (a variant of) the following:
> >>>   "any access to a naturally aligned scalar object in memory
> >>>   that is not a bit-field will be performed by a single machine
> >>>   instruction whenever possible"
> >>>and it seems the current compiler actually does work like this.
> >>Seems a pity to have the bit-field exception here, why is it there?
> >
> >Bit-fields will generally require a read-modify-write instruction,
> >and I don't think we actually guarantee to generate one right now.
> 
> Well if they do require more than one instruction, the rule has
> no effect ("whenever possible"). If they can be done in one
> instruction  (as on the x86), then why not require this, why
> make a special case?

Because for the consumers whether the operation is done using
a single machine instruction is uninteresting.  What matters is
if that instruction is atomic.  x86 read-modify-write instructions
aren't atomic, unless lock prefix is used (and we definitely don't
want to use lock prefix on all bitfield accesses) - it actually
means there are separate read, modify and write uops.

Jakub


Constrain valid arguments to BIT_FIELD_REF

2008-03-04 Thread Richard Guenther

BIT_FIELD_REF is currently only generated by the middle-end (fold, SRA
and parts of the vectorizer).  At the moment the bit position and size
of the extract can be non-constant and the type of the result is
unspecified.

I suggest to make sure that bit position and size are constants, the
object referenced is of integral type (BIT_FIELD_REF should not be
used as a way to circumvent aliasing) and the result type is of the
same type as the operand zero type (and not a bitfield type of the
referenced size -- in which case the BIT_FIELD_REF_UNSIGNED would
be useless).  The result would then be properly extended according
to BIT_FIELD_REF_UNSIGNED.

Is this how it was intended?

fold currently optimizes a.b.c == 0 to BIT_FIELD_REF  & 1
for bit field field-decls c.  IMHO this is bad because it pessimizes
TBAA (needs to use a's alias set, not the underlying integral type
alias set) and it "breaks" type correctness as arbitrary structure
types appear as operand zero.

?

Thanks,
Richard.


Re: atomic accesses

2008-03-04 Thread Paul Brook
> Well if they do require more than one instruction, the rule has
> no effect ("whenever possible"). If they can be done in one
> instruction  (as on the x86), then why not require this, why
> make a special case?

We don't even guarantee consistent behavior for volatile bitfields, so I 
really doubt we can guarantee it for non-volatile bitfields.

In particular "int32_t foo:8;" may use either an 8-bit or a 32-bit access, 
depending what the compiler feels like.

Paul


Re: atomic accesses

2008-03-04 Thread Paul Koning
I'm really wondering why this is being considered.

A documented property of the form "GCC will use a single instruction
to do X when possible" means exactly nothing.  In particular, to call
such a statement a "guarantee" is seriously misleading.

If Linux needs the single-instruction property for atomicity, and it
thinks it can rely on this supposed property, then Linux has a bug.
To do atomic operations, you have to use primitives that are
guaranteed always to have the necessary atomicity properties.
Typically those would be found in asm statements.

I suspect it would be valuable to have standardized primitives for
atomic actions (semaphores, spinlocks, test-and-set primitives,
circular buffers, pick one).  But GCC's load/store semantics are not
those primitives, with or without a documented "single instruction
when possible" property.  

   paul




Re: Constrain valid arguments to BIT_FIELD_REF

2008-03-04 Thread Diego Novillo

On 3/4/08 10:55 AM, Richard Guenther wrote:


I suggest to make sure that bit position and size are constants, the
object referenced is of integral type (BIT_FIELD_REF should not be
used as a way to circumvent aliasing) and the result type is of the
same type as the operand zero type (and not a bitfield type of the
referenced size -- in which case the BIT_FIELD_REF_UNSIGNED would
be useless).  The result would then be properly extended according
to BIT_FIELD_REF_UNSIGNED.

Is this how it was intended?


If it wasn't, I think the semantics you propose are fine.  If this is 
only generated by the ME, it should be easy to change.




fold currently optimizes a.b.c == 0 to BIT_FIELD_REF  & 1
for bit field field-decls c.  IMHO this is bad because it pessimizes
TBAA (needs to use a's alias set, not the underlying integral type
alias set) and it "breaks" type correctness as arbitrary structure
types appear as operand zero.


Agreed.  Unless this was done to fix some target-specific problem, I 
think it should disappear.



Diego.


Re: atomic accesses

2008-03-04 Thread Andrew Haley

Paul Koning wrote:

I'm really wondering why this is being considered.

A documented property of the form "GCC will use a single instruction
to do X when possible" means exactly nothing.  In particular, to call
such a statement a "guarantee" is seriously misleading.


I agree.


If Linux needs the single-instruction property for atomicity, and it
thinks it can rely on this supposed property, then Linux has a bug.
To do atomic operations, you have to use primitives that are
guaranteed always to have the necessary atomicity properties.


Yes.


Typically those would be found in asm statements.



I suspect it would be valuable to have standardized primitives for
atomic actions (semaphores, spinlocks, test-and-set primitives,
circular buffers, pick one).


We already have these in gcc, and they're even documented.

Andrew.


Re: Constrain valid arguments to BIT_FIELD_REF

2008-03-04 Thread Jakub Jelinek
On Tue, Mar 04, 2008 at 11:15:00AM -0500, Diego Novillo wrote:
> >fold currently optimizes a.b.c == 0 to BIT_FIELD_REF  & 1
> >for bit field field-decls c.  IMHO this is bad because it pessimizes
> >TBAA (needs to use a's alias set, not the underlying integral type
> >alias set) and it "breaks" type correctness as arbitrary structure
> >types appear as operand zero.
> 
> Agreed.  Unless this was done to fix some target-specific problem, I 
> think it should disappear.

Perhaps not in early GIMPLE passes, but we certainly want to lower
bitfield accesses to BIT_FIELD_REFs or something similar before expansion,
otherwise expander and RTL optimization passes aren't able to optimize but
the most trivial cases.  GCC generates for bitfields terrible code ATM,
try say:
struct S
{
  unsigned int a : 3;
  unsigned int b : 3;
  unsigned int c : 3;
  unsigned int d : 3;
  unsigned int e : 3;
  unsigned int f : 3;
  unsigned int g : 3;
  unsigned int h : 11;
} a, b, c;

void foo (void)
{
  a.a = b.a | c.a;
  a.b = b.b | c.b;
  a.c = b.c | c.c;
  a.d = b.d | c.d;
  a.e = b.e | c.e;
  a.f = b.f | c.f;
  a.g = b.g | c.g;
  a.h = b.h | c.h;
}
which could be optimized into BIT_FIELD_REF  = BIT_FIELD_REF  | BIT_FIELD_REF ;
so something like 3 or 4 instructions, yet we generate 51.
Operating on adjacent bitfield fields is fairly common.
Similarly (and perhaps far more common in the wild) is e.g.
void bar (void)
{
  a.a = 1;
  a.b = 2;
  a.c = 3;
  a.d = 4;
  a.e = 5;
  a.f = 6;
  a.g = 7;
  a.h = 8;
}
- on x86_64 24 instructions on the trunk, 1 is enough.
RTL is too late to try to optimize this, I've tried that once.
Given combiner's limitation of only trying to combine 3 instructions
at once, we'd need more.  So this is something that needs to
be optimized at the tree level, either by having a separate pass
that takes care of it, or by lowering it early enough into something
that the optimizers will handle.

Jakub


Re: Constrain valid arguments to BIT_FIELD_REF

2008-03-04 Thread Richard Guenther
On Tue, 4 Mar 2008, Jakub Jelinek wrote:

> On Tue, Mar 04, 2008 at 11:15:00AM -0500, Diego Novillo wrote:
> > >fold currently optimizes a.b.c == 0 to BIT_FIELD_REF  & 1
> > >for bit field field-decls c.  IMHO this is bad because it pessimizes
> > >TBAA (needs to use a's alias set, not the underlying integral type
> > >alias set) and it "breaks" type correctness as arbitrary structure
> > >types appear as operand zero.
> > 
> > Agreed.  Unless this was done to fix some target-specific problem, I 
> > think it should disappear.
> 
> Perhaps not in early GIMPLE passes, but we certainly want to lower
> bitfield accesses to BIT_FIELD_REFs or something similar before expansion,
> otherwise expander and RTL optimization passes aren't able to optimize but
> the most trivial cases.  GCC generates for bitfields terrible code ATM,
> try say:
> struct S
> {
>   unsigned int a : 3;
>   unsigned int b : 3;
>   unsigned int c : 3;
>   unsigned int d : 3;
>   unsigned int e : 3;
>   unsigned int f : 3;
>   unsigned int g : 3;
>   unsigned int h : 11;
> } a, b, c;
> 
> void foo (void)
> {
>   a.a = b.a | c.a;
>   a.b = b.b | c.b;
>   a.c = b.c | c.c;
>   a.d = b.d | c.d;
>   a.e = b.e | c.e;
>   a.f = b.f | c.f;
>   a.g = b.g | c.g;
>   a.h = b.h | c.h;
> }
> which could be optimized into BIT_FIELD_REF  = BIT_FIELD_REF  32, 0> | BIT_FIELD_REF ;
> so something like 3 or 4 instructions, yet we generate 51.
> Operating on adjacent bitfield fields is fairly common.
> Similarly (and perhaps far more common in the wild) is e.g.
> void bar (void)
> {
>   a.a = 1;
>   a.b = 2;
>   a.c = 3;
>   a.d = 4;
>   a.e = 5;
>   a.f = 6;
>   a.g = 7;
>   a.h = 8;
> }
> - on x86_64 24 instructions on the trunk, 1 is enough.
> RTL is too late to try to optimize this, I've tried that once.
> Given combiner's limitation of only trying to combine 3 instructions
> at once, we'd need more.  So this is something that needs to
> be optimized at the tree level, either by having a separate pass
> that takes care of it, or by lowering it early enough into something
> that the optimizers will handle.

Sure.  With 4.3 SRA tries to do this.  With the MEM_REF lowering I have
we optimize the above to

foo ()
{
  unsigned int MEML.2;
  unsigned int MEML.1;
  unsigned int MEML.0;

:
  MEML.0 = MEM ;
  MEML.1 = MEM ;
  MEML.2 = MEM ;

(load all three words once)

  MEM  = BIT_FIELD_EXPR ) ((unsigned 
char) BIT_FIELD_REF  | (unsigned char) BIT_FIELD_REF 
), 3, 0>, () ((unsigned char) 
BIT_FIELD_REF  | (unsigned char) BIT_FIELD_REF ), 3, 3>, () ((unsigned char) BIT_FIELD_REF  | (unsigned char) BIT_FIELD_REF ), 3, 6>, 
() ((unsigned char) BIT_FIELD_REF  | 
(unsigned char) BIT_FIELD_REF ), 3, 9>, 
() ((unsigned char) BIT_FIELD_REF  | 
(unsigned char) BIT_FIELD_REF ), 3, 12>, 
() ((unsigned char) BIT_FIELD_REF  | 
(unsigned char) BIT_FIELD_REF ), 3, 15>, 
() ((unsigned char) BIT_FIELD_REF  | 
(unsigned char) BIT_FIELD_REF ), 3, 18>, 
() ((short unsigned int) BIT_FIELD_REF  | (short unsigned int) BIT_FIELD_REF ), 11, 21>;
  return;

}

TER makes a mess out of the expression and obviously we miss some
expression combining here (I only have trivial constant folding
implemented for BIT_FIELD_EXPR right now).

Richard.


Re: atomic accesses

2008-03-04 Thread Jakub Jelinek
On Tue, Mar 04, 2008 at 04:37:29PM +, Andrew Haley wrote:
> >Typically those would be found in asm statements.
> 
> >I suspect it would be valuable to have standardized primitives for
> >atomic actions (semaphores, spinlocks, test-and-set primitives,
> >circular buffers, pick one).
> 
> We already have these in gcc, and they're even documented.

We don't have atomic read or atomic write builtins (ok, you could
abuse __sync_fetch_and_add (&x, 0) for atomic read and a loop
with __sync_compare_and_swap_val for atomic store, but that's a horrible
overkill.  Being able to assume that for non-bitfield accesses bigger
than certain minimum size, smaller or equal to the word size and
naturally aligned the compiler will read or write a value in one lump
is certainly desirable and many programs assume it heavily (starting with
glibc, kernel, libgomp, ...).
The "certain minimum size" is typically either size of char, or (e.g. on old
alphas) size of int.  Typically the programs care about atomicity of
accesses to int, long and pointer sized vars, e.g. have only threads in
a critical section modify a variable, but be able to read that variable
outside of critical section and see only values that were written in the
critical section, not say half of an old value and half of a new value.

Jakub


Re: Constrain valid arguments to BIT_FIELD_REF

2008-03-04 Thread Andrew Pinski
On 3/4/08, Richard Guenther <[EMAIL PROTECTED]> wrote:
>  I suggest to make sure that bit position and size are constants, the
>  object referenced is of integral type (BIT_FIELD_REF should not be
>  used as a way to circumvent aliasing) and the result type is of the
>  same type as the operand zero type (and not a bitfield type of the
>  referenced size -- in which case the BIT_FIELD_REF_UNSIGNED would
>  be useless).  The result would then be properly extended according
>  to BIT_FIELD_REF_UNSIGNED.

I tried non constant bit position with BIT_FIELD_REF of vector types
and it crashed in expand so I think this is the correct thing to do.
Though it would be nice if we have a VEC_EXTRACT tree instead of
overloading BIT_FIELD_REF for it that takes a non constant position so
we can do better optimization there in some cases (yes people write
code that extracts parts of vectors, trust me).

-- Pinski


Re: atomic accesses

2008-03-04 Thread Andrew Haley

Jakub Jelinek wrote:

On Tue, Mar 04, 2008 at 04:37:29PM +, Andrew Haley wrote:

Typically those would be found in asm statements.
I suspect it would be valuable to have standardized primitives for
atomic actions (semaphores, spinlocks, test-and-set primitives,
circular buffers, pick one).

We already have these in gcc, and they're even documented.


We don't have atomic read or atomic write builtins (ok, you could
abuse __sync_fetch_and_add (&x, 0) for atomic read and a loop
with __sync_compare_and_swap_val for atomic store, but that's a horrible
overkill.  Being able to assume that for non-bitfield accesses bigger
than certain minimum size, smaller or equal to the word size and
naturally aligned the compiler will read or write a value in one lump
is certainly desirable and many programs assume it heavily (starting with
glibc, kernel, libgomp, ...).


That seems reasonable, but I suspect that coming up with wordage to
describe it sufficiently formally for all cases will be tricky.

AFAIK the only reason we don't break this rule is that doing so would
be grossly inefficient; there's nothing to stop any gcc back-end with
(say) seriously slow DImode writes from using two SImode writes instead.


The "certain minimum size" is typically either size of char, or (e.g. on old
alphas) size of int.  Typically the programs care about atomicity of
accesses to int, long and pointer sized vars, e.g. have only threads in
a critical section modify a variable, but be able to read that variable
outside of critical section and see only values that were written in the
critical section, not say half of an old value and half of a new value.


Andrew.


Re: atomic accesses

2008-03-04 Thread Paul Koning
> "Andrew" == Andrew Haley <[EMAIL PROTECTED]> writes:

 >>  We don't have atomic read or atomic write builtins (ok, you could
 >> abuse __sync_fetch_and_add (&x, 0) for atomic read and a loop with
 >> __sync_compare_and_swap_val for atomic store, but that's a
 >> horrible overkill.  Being able to assume that for non-bitfield
 >> accesses bigger than certain minimum size, smaller or equal to the
 >> word size and naturally aligned the compiler will read or write a
 >> value in one lump is certainly desirable and many programs assume
 >> it heavily (starting with glibc, kernel, libgomp, ...).

 Andrew> That seems reasonable, but I suspect that coming up with
 Andrew> wordage to describe it sufficiently formally for all cases
 Andrew> will be tricky.

Probably so.  In any case, clearly no such wordage could include the
phrase "when possible".

 Andrew> AFAIK the only reason we don't break this rule is that doing
 Andrew> so would be grossly inefficient; there's nothing to stop any
 Andrew> gcc back-end with (say) seriously slow DImode writes from
 Andrew> using two SImode writes instead.

Or, say, because you're using the MIPS O32 ABI rather than the N32/N64
ABI... which is yet another example why a formal rule is tricky.

 >> The "certain minimum size" is typically either size of char, or
 >> (e.g. on old alphas) size of int.  

Not just old Alphas.  Even if the instruction set has a "store
character" opcode, the hardware is going to do a read-modify-write
internally; is that an atomic operation?  Not necessarily.  Perhaps
even "only rarely".

 paul



Re: Constrain valid arguments to BIT_FIELD_REF

2008-03-04 Thread Richard Guenther
On Tue, 4 Mar 2008, Andrew Pinski wrote:

> On 3/4/08, Richard Guenther <[EMAIL PROTECTED]> wrote:
> >  I suggest to make sure that bit position and size are constants, the
> >  object referenced is of integral type (BIT_FIELD_REF should not be
> >  used as a way to circumvent aliasing) and the result type is of the
> >  same type as the operand zero type (and not a bitfield type of the
> >  referenced size -- in which case the BIT_FIELD_REF_UNSIGNED would
> >  be useless).  The result would then be properly extended according
> >  to BIT_FIELD_REF_UNSIGNED.
> 
> I tried non constant bit position with BIT_FIELD_REF of vector types
> and it crashed in expand so I think this is the correct thing to do.
> Though it would be nice if we have a VEC_EXTRACT tree instead of
> overloading BIT_FIELD_REF for it that takes a non constant position so
> we can do better optimization there in some cases (yes people write
> code that extracts parts of vectors, trust me).

FWIW I agree.  After all we also have REALPART_EXPR and IMAGPART_EXPR,
a VEC_EXTRACT sounds fine (after all we already have a lot of VEC_
codes).

At least BIT_FIELD_REF should not be VIEW_CONVERT_EXPR on steroids.

Richard.


Re: Benchmarks: 7z, bzip2 & gzip.

2008-03-04 Thread Daniel Jacobowitz
On Tue, Mar 04, 2008 at 09:02:34AM +, Martin Guy wrote:
> Is there a clause in regressions for "takes longer to compile and
> produces worse code"?

Worse code is a regression, so is slower compile time.  Both are
judgement calls; some of them are not going to be changed, but safe
patches changing them are allowed on regression-only branches.

-- 
Daniel Jacobowitz
CodeSourcery


[4.3/4.4]: PATCH: PR target/35453: nmmintrin.h defines macros SIDD_XXX

2008-03-04 Thread H.J. Lu
Hi,

Here is the patch for both gcc 4.3 and 4.4. OK for 4.3/4.4? Tested on Linux/ia32
and Linux/ia64 with gcc 4.3/4.4.

Thanks.


H.J.
On Tue, Mar 4, 2008 at 1:19 AM, Richard Guenther <[EMAIL PROTECTED]> wrote:
> On Mon, 3 Mar 2008, H.J. Lu wrote:
>
>  > Hi,
>  >
>  > I'd like to fix
>  >
>  > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35453
>  >
>  > for gcc 4.3. Defines SIDD_XXX in SSE4 header file is a bad idea. SSE 4
>  > header file
>  > in icc will also be fixed.
>
>  Works for me.
>
>  Richard.
>
gcc/

2008-03-03  H.J. Lu  <[EMAIL PROTECTED]>

	PR target/35453
	* config/i386/smmintrin.h (SIDD_XXX): Renamed to ...
	(_SIDD_XXX): This.

gcc/testsuite/

2008-03-03  H.J. Lu  <[EMAIL PROTECTED]>

	PR target/35453
	* gcc.target/i386/sse4_2-pcmpestri-1.c: Replace SIDD_XXX with
	_SIDD_XXX.
	* gcc.target/i386/sse4_2-pcmpestri-2.c: Likewise.
	* gcc.target/i386/sse4_2-pcmpestrm-1.c: Likewise.
	* gcc.target/i386/sse4_2-pcmpestrm-2.c: Likewise.
	* gcc.target/i386/sse4_2-pcmpistri-1.c: Likewise.
	* gcc.target/i386/sse4_2-pcmpistri-2.c: Likewise.
	* gcc.target/i386/sse4_2-pcmpistrm-1.c: Likewise.
	* gcc.target/i386/sse4_2-pcmpistrm-2.c: Likewise.
	* gcc.target/i386/sse4_2-pcmpstr.h: Likewise.

--- gcc/config/i386/smmintrin.h.sidd	2007-12-15 15:49:23.0 -0800
+++ gcc/config/i386/smmintrin.h	2008-03-03 20:22:22.0 -0800
@@ -470,30 +470,30 @@ _mm_stream_load_si128 (__m128i *__X)
 #ifdef __SSE4_2__
 
 /* These macros specify the source data format.  */
-#define SIDD_UBYTE_OPS			0x00
-#define SIDD_UWORD_OPS			0x01
-#define SIDD_SBYTE_OPS			0x02
-#define SIDD_SWORD_OPS			0x03
+#define _SIDD_UBYTE_OPS			0x00
+#define _SIDD_UWORD_OPS			0x01
+#define _SIDD_SBYTE_OPS			0x02
+#define _SIDD_SWORD_OPS			0x03
 
 /* These macros specify the comparison operation.  */
-#define SIDD_CMP_EQUAL_ANY		0x00
-#define SIDD_CMP_RANGES			0x04
-#define SIDD_CMP_EQUAL_EACH		0x08
-#define SIDD_CMP_EQUAL_ORDERED		0x0c
+#define _SIDD_CMP_EQUAL_ANY		0x00
+#define _SIDD_CMP_RANGES			0x04
+#define _SIDD_CMP_EQUAL_EACH		0x08
+#define _SIDD_CMP_EQUAL_ORDERED		0x0c
 
 /* These macros specify the the polarity.  */
-#define SIDD_POSITIVE_POLARITY		0x00
-#define SIDD_NEGATIVE_POLARITY		0x10
-#define SIDD_MASKED_POSITIVE_POLARITY	0x20
-#define SIDD_MASKED_NEGATIVE_POLARITY	0x30
+#define _SIDD_POSITIVE_POLARITY		0x00
+#define _SIDD_NEGATIVE_POLARITY		0x10
+#define _SIDD_MASKED_POSITIVE_POLARITY	0x20
+#define _SIDD_MASKED_NEGATIVE_POLARITY	0x30
 
 /* These macros specify the output selection in _mm_cmpXstri ().  */
-#define SIDD_LEAST_SIGNIFICANT		0x00
-#define SIDD_MOST_SIGNIFICANT		0x40
+#define _SIDD_LEAST_SIGNIFICANT		0x00
+#define _SIDD_MOST_SIGNIFICANT		0x40
 
 /* These macros specify the output selection in _mm_cmpXstrm ().  */
-#define SIDD_BIT_MASK			0x00
-#define SIDD_UNIT_MASK			0x40
+#define _SIDD_BIT_MASK			0x00
+#define _SIDD_UNIT_MASK			0x40
 
 /* Intrinsics for text/string processing.  */
 
--- gcc/testsuite/gcc.target/i386/sse4_2-pcmpestri-1.c.sidd	2007-08-23 09:44:31.0 -0700
+++ gcc/testsuite/gcc.target/i386/sse4_2-pcmpestri-1.c	2008-03-03 20:23:20.0 -0800
@@ -8,15 +8,15 @@
 #define NUM 1024
 
 #define IMM_VAL0 \
-  (SIDD_SBYTE_OPS | SIDD_CMP_RANGES | SIDD_MASKED_POSITIVE_POLARITY)
+  (_SIDD_SBYTE_OPS | _SIDD_CMP_RANGES | _SIDD_MASKED_POSITIVE_POLARITY)
 #define IMM_VAL1 \
- (SIDD_UBYTE_OPS | SIDD_CMP_EQUAL_EACH | SIDD_NEGATIVE_POLARITY \
-  | SIDD_MOST_SIGNIFICANT)
+ (_SIDD_UBYTE_OPS | _SIDD_CMP_EQUAL_EACH | _SIDD_NEGATIVE_POLARITY \
+  | _SIDD_MOST_SIGNIFICANT)
 #define IMM_VAL2 \
- (SIDD_UWORD_OPS | SIDD_CMP_EQUAL_ANY | SIDD_MASKED_NEGATIVE_POLARITY)
+ (_SIDD_UWORD_OPS | _SIDD_CMP_EQUAL_ANY | _SIDD_MASKED_NEGATIVE_POLARITY)
 #define IMM_VAL3 \
-  (SIDD_SWORD_OPS | SIDD_CMP_EQUAL_ORDERED \
-   | SIDD_MASKED_NEGATIVE_POLARITY | SIDD_LEAST_SIGNIFICANT)
+  (_SIDD_SWORD_OPS | _SIDD_CMP_EQUAL_ORDERED \
+   | _SIDD_MASKED_NEGATIVE_POLARITY | _SIDD_LEAST_SIGNIFICANT)
 
 
 static void
--- gcc/testsuite/gcc.target/i386/sse4_2-pcmpestri-2.c.sidd	2007-08-23 09:44:31.0 -0700
+++ gcc/testsuite/gcc.target/i386/sse4_2-pcmpestri-2.c	2008-03-03 20:23:25.0 -0800
@@ -8,15 +8,15 @@
 #define NUM 1024
 
 #define IMM_VAL0 \
-  (SIDD_SBYTE_OPS | SIDD_CMP_RANGES | SIDD_MASKED_POSITIVE_POLARITY)
+  (_SIDD_SBYTE_OPS | _SIDD_CMP_RANGES | _SIDD_MASKED_POSITIVE_POLARITY)
 #define IMM_VAL1 \
- (SIDD_UBYTE_OPS | SIDD_CMP_EQUAL_EACH | SIDD_NEGATIVE_POLARITY \
-  | SIDD_MOST_SIGNIFICANT)
+ (_SIDD_UBYTE_OPS | _SIDD_CMP_EQUAL_EACH | _SIDD_NEGATIVE_POLARITY \
+  | _SIDD_MOST_SIGNIFICANT)
 #define IMM_VAL2 \
- (SIDD_UWORD_OPS | SIDD_CMP_EQUAL_ANY | SIDD_MASKED_NEGATIVE_POLARITY)
+ (_SIDD_UWORD_OPS | _SIDD_CMP_EQUAL_ANY | _SIDD_MASKED_NEGATIVE_POLARITY)
 #define IMM_VAL3 \
-  (SIDD_SWORD_OPS | SIDD_CMP_EQUAL_ORDERED \
-   | SIDD_MASKED_NEGATIVE_POLARITY | SIDD_LEAST_SIGNIFICANT)
+  (_SIDD_SWORD_OPS | _SIDD_CMP_EQUAL_ORDERED \
+   | _SIDD_MASKED_NEGATIVE_POLARITY | _SIDD_LEAST_SIGNIFICANT)
 
 
 static void
--- gcc/testsuite/gcc.tar

Re: atomic accesses

2008-03-04 Thread David Daney

Jakub Jelinek wrote:

On Tue, Mar 04, 2008 at 04:37:29PM +, Andrew Haley wrote:

Typically those would be found in asm statements.
I suspect it would be valuable to have standardized primitives for
atomic actions (semaphores, spinlocks, test-and-set primitives,
circular buffers, pick one).

We already have these in gcc, and they're even documented.


We don't have atomic read or atomic write builtins (ok, you could
abuse __sync_fetch_and_add (&x, 0) for atomic read and a loop
with __sync_compare_and_swap_val for atomic store, but that's a horrible
overkill.


There is nothing preventing us from adding __sync_fetch and __sync_store 
so that we could avoid the overkill.


Perhaps anything declared volatile should have these semantics. 
Although mentioning 'volatile' on the lkml is probably not a good idea.



David Daney


Re: atomic accesses

2008-03-04 Thread Richard Guenther
On Tue, Mar 4, 2008 at 7:31 PM, David Daney <[EMAIL PROTECTED]> wrote:
> Jakub Jelinek wrote:
>  > On Tue, Mar 04, 2008 at 04:37:29PM +, Andrew Haley wrote:
>  >>> Typically those would be found in asm statements.
>  >>> I suspect it would be valuable to have standardized primitives for
>  >>> atomic actions (semaphores, spinlocks, test-and-set primitives,
>  >>> circular buffers, pick one).
>  >> We already have these in gcc, and they're even documented.
>  >
>  > We don't have atomic read or atomic write builtins (ok, you could
>  > abuse __sync_fetch_and_add (&x, 0) for atomic read and a loop
>  > with __sync_compare_and_swap_val for atomic store, but that's a horrible
>  > overkill.
>
>  There is nothing preventing us from adding __sync_fetch and __sync_store
>  so that we could avoid the overkill.
>
>  Perhaps anything declared volatile should have these semantics.
>  Although mentioning 'volatile' on the lkml is probably not a good idea.

Certainly not.  volatile has nothing to do with atomic access.

Richard.


Re: atomic accesses

2008-03-04 Thread David Daney

Richard Guenther wrote:

On Tue, Mar 4, 2008 at 7:31 PM, David Daney <[EMAIL PROTECTED]> wrote:



 Perhaps anything declared volatile should have these semantics.
 Although mentioning 'volatile' on the lkml is probably not a good idea.


Certainly not.  volatile has nothing to do with atomic access.



It was more of an idle thought than a proposal.

David Daney



Re: birthpoints in rtl.

2008-03-04 Thread Steven Bosscher
On Sat, Mar 1, 2008 at 2:46 PM, Paolo Bonzini <[EMAIL PROTECTED]> wrote:
>
>  > By the way, I still don't understand how birth points would work.  Can
>  > someone give an example of what the insn stream would look like with
>  > birth points, and what the DU/UD chains would look like?
>
>  With a big IIUC, and using a high-level IR for simplicity
>
> if (a < 5) goto BB1; else goto BB2;
>   BB1: b = 3; goto BB3;
>   BB2: b = c; goto BB3;
>   BB3: return b * b;
>
> DF info for b:
>   insn  has def D1
>   insn  has def D2
>   insn  has use U1 (use-def chain [D1,D2]) and
>use U2 (use-def chain [D1,D2])
>
>  becomes
>
> if (a < 5) goto BB1; else goto BB2;
>   BB1: b = 3; goto BB2;
>   BB2: b = c; goto BB3;
>   BB3: b = b; return b * b;
>
> DF info for b:
>   insnhas def D1
>   insnhas def D2
>   birthpoint  has use U1 (use-def chain [D1,D2])
>  and def D3
>   insnhas use U2 (use-def chain [D3])
>  and use U3 (use-def chain [D3])
>
>  Basically the only non-singleton UD chains are for a birthpoint's RHS,
>  and the UD chains of birthpoints correspond to PHI operands.  The
>  singleton UD chains correspond to subscripted SSA variables.  I think
>  this is isomorphic to FUD chains.

Thanks for this explanation!  I had the feeling that FUD chains and
birth points are just different names for basically the same thing.
That also seems to be your interpretation.

So, from an implementation, would we make PHI-like UD-chains to nop
insns that represent the birth points, or would we actually add PHI
functions and let the "normal" UD-chains point to the PHI function
arguments?

What about keeping things up-to-date after applying some
transformations? It is already hard to keep UD/DU chains up-to-date
now (I don't think any pass successfully does so right now). This
should be a lot easier if you fully factorize your UD chains, right?

Gr.
Steven


Re: birthpoints in rtl.

2008-03-04 Thread Diego Novillo

On 3/4/08 1:53 PM, Steven Bosscher wrote:


So, from an implementation, would we make PHI-like UD-chains to nop
insns that represent the birth points, or would we actually add PHI
functions and let the "normal" UD-chains point to the PHI function
arguments?


Why put them in the IL stream at all?  All you need is to have these 
factoring devices in the DF web.  It really doesn't need to be part of 
the IL stream.


Both PHIs and birthpoints are merely factoring devices that let you cut 
down the number of UD links.  They don't need to be part of the IL, much 
like none of the DF objects are part of the RTL IL.




What about keeping things up-to-date after applying some
transformations? It is already hard to keep UD/DU chains up-to-date
now (I don't think any pass successfully does so right now). This
should be a lot easier if you fully factorize your UD chains, right?


In theory, yes.  Code for keeping these things up-to-date already exists 
in GIMPLE SSA.



Diego.


Re: birthpoints in rtl.

2008-03-04 Thread Steven Bosscher
On Sat, Mar 1, 2008 at 11:13 AM, Jan Hubicka <[EMAIL PROTECTED]> wrote:
> > Diego,
>  >
>  > I am leaning to just adding noop moves at the birthpoints (dominance
>  > frontiers) as real noop move insns in the streams in the passes that use
>  > ud or du chains.   The back end is tolerant of noop moves and without
>
>  Hi,
>  while I am with Diego that would preffer PHI nodes on side especially in
>  FUD chain where rest of your SSA is on side too.  But if we go with the
>  extra instruction scheme, I think you are much better to use RTL USE
>  instruction.  The moves are generated by target machinery and can do
>  funny things, like clobbering flags or whatever.  USEs are transparent
>  this way.

I think that without the extra instruction it will be difficult to
keep the FUD chains up to date.  One of the nice features of the extra
instruction (be it a PHI, trivial move, or a USE) is that it
explicitly gives you a new location for a DEF.   That makes updating
things a lot easier, I suspect.  Consider e.g. replacing an operand in
an insn. Because you know only the DEF of operand in the the extra
instruction will reach, you can update UD chains right away.  This is
harder if there are more reaching defs.  The most trivial example I
can think of is e.g.

1  if (c)
2x = ...;  D1(x)
3  else
4x = ...;  D2(x)
5
6  y = x;  D4(y), U2(x), UD(U2,D1), UD(U2,D2)
7  z = y;  D5(z), U3(y), UD(U3,D4)

If you copy propagate x to z, you have to add an extra UD chain.  With
the extra instruction, you don't:

1  if (c)
2x = ...;  D1(x)
3  else
4x = ...;  D2(x)
5
6  x = x;  D3(x), U1(x), UD(U1,D1|D2)
6  y = x;  D4(y), U2(x), UD(U2,D3)
7  z = y;  D5(z), U3(y), UD(U3,D4)


For the location of the extra instructions, I would *not* keep them on
the side.  If you have something special going on, my motto is: "Make
it explicit".

Gr.
Steven


Re: birthpoints in rtl.

2008-03-04 Thread Steven Bosscher
On Tue, Mar 4, 2008 at 7:58 PM, Diego Novillo <[EMAIL PROTECTED]> wrote:
>  Both PHIs and birthpoints are merely factoring devices that let you cut
>  down the number of UD links.  They don't need to be part of the IL, much
>  like none of the DF objects are part of the RTL IL.

Maybe they don't need to be, but it may be useful to have them anyway.

>  > What about keeping things up-to-date after applying some
>  > transformations? It is already hard to keep UD/DU chains up-to-date
>  > now (I don't think any pass successfully does so right now). This
>  > should be a lot easier if you fully factorize your UD chains, right?
>
>  In theory, yes.  Code for keeping these things up-to-date already exists
>  in GIMPLE SSA.

That code is IMHO just awfully ugly. And slow too, last I checked.  I
don't know if  this is still true, but update_ssa used to walk a huge
part of the dominator tree even for seemingly trivial updates. We
should not want that on RTL.  I don't think we should allow
transformations on RTL that are too hard to manually update the FUD
chains somehow.

Gr.
Steven


Re: birthpoints in rtl.

2008-03-04 Thread Diego Novillo

On 3/4/08 2:12 PM, Steven Bosscher wrote:


That code is IMHO just awfully ugly. And slow too, last I checked.


Yeah, there's quite a bit of bookkeeping needed to do incremental SSA 
updates.



should not want that on RTL.  I don't think we should allow
transformations on RTL that are too hard to manually update the FUD
chains somehow.


For FUD chains, the incremental work is actually easier to code.  You 
need to re-write the FUD chains for the affected objects from scratch. 
Forcing the optimizers to maintain this themselves is certainly easier 
for the framework code, but it imposes a heavier burden on the optimizers.



Diego.


Re: Benchmarks: 7z, bzip2 & gzip.

2008-03-04 Thread Bernardo Innocenti
J.C. Pizarro wrote:

>   p7zip-4.57
> [...]
> 1. 1m50s compile, 1630164 file, 1618639 text, 6120 data, 27168 bss, 5m50s run.
> 2. 1m53s compile, 1665952 file, 1649829 text, 4668 data, 29160 bss, 6m04s run.
> 3. 2m08s compile, 1629088 file, 1613313 text, 4672 data, 29160 bss, 5m54s run.
> 4. 2m36s compile, 2063216 file, 2047420 text, 4380 data, 29160 bss, 6m14s run.
> 5. 2m30s compile, 1976228 file, 1960164 text, 4380 data, 29160 bss, 6m12s run.

Has anybody analyzed this code size and performance regression?
For simplicity, we could limit our comparison to 3 (gcc-4.2.4)
against 4 (gcc-4.3.0) or 5 (gcc-4.4.0).

For the performance regression, one could proceed like this:
 - profile the 7z run to find the hot-spot(s)
 - disassemble the output of both compilers
 - look for obvious pessimizations in the hot parts

We're looking for a 6-7% change in time, so it may very well
be a single instruction.

-- 
 \___/
 |___|   Bernardo Innocenti - http://www.codewiz.org/
  \___\  One Laptop Per Child - http://www.laptop.org/


Re: birthpoints in rtl.

2008-03-04 Thread Kenneth Zadeck
Steven Bosscher wrote:
> On Tue, Mar 4, 2008 at 7:58 PM, Diego Novillo <[EMAIL PROTECTED]> wrote:
>   
>>  Both PHIs and birthpoints are merely factoring devices that let you cut
>>  down the number of UD links.  They don't need to be part of the IL, much
>>  like none of the DF objects are part of the RTL IL.
>> 
>
> Maybe they don't need to be, but it may be useful to have them anyway.
>
>   
>>  > What about keeping things up-to-date after applying some
>>  > transformations? It is already hard to keep UD/DU chains up-to-date
>>  > now (I don't think any pass successfully does so right now). This
>>  > should be a lot easier if you fully factorize your UD chains, right?
>>
>>  In theory, yes.  Code for keeping these things up-to-date already exists
>>  in GIMPLE SSA.
>> 
>
> That code is IMHO just awfully ugly. And slow too, last I checked.  I
> don't know if  this is still true, but update_ssa used to walk a huge
> part of the dominator tree even for seemingly trivial updates. We
> should not want that on RTL.  I don't think we should allow
> transformations on RTL that are too hard to manually update the FUD
> chains somehow.
>
> Gr.
> Steven
>   
There are many differences between fud/birthpoints and real ssa:

1) In ssa, the operands of the phis and the renaming contain
information.   The operands are paired with the cfg edges that the
values come in on.   In fud/birthpoints there is no such pairing or
renaming.   For some problems, like conditional constant, this pairing
and renaming is what makes the algorithm work.  You actually do not get
the same answer (you get an inferior but still correct answer) if you do
not use the pairing and renaming.

2) There are two "kinds" of ssa algorithms that are used in gcc.   There
are dirty ssa algorithms and "clean" algorithms.  Dirty algorithms do
not keep ssa up to date as you make the transformations.  Clean ones
do.  I have never understood why it was necessary for gcc to use so many
dirty ssa algorithms.  When NaturalBridge built our compiler, we were
able to use almost exclusively clean algorithms.   Tricks like loop
closed ssa form, help, but in general this is a matter of care on the
algorithmic design side and implementation.   We never had something
like rebuild ssa at naturalbridge.  

I do not know how much this changes with memory ssa.  I assume clean
algorithms are harder here, but I have no experience with it.  I have
had particularly heated discussions with zdenek over the years where he
asserted that it was not worth it/impossible to think about clean ssa
algorithms and i showed him simple tricks to keep things up to date.  I
believe he just simply ignored me.

fud/birthpoints are generally harder to develop clean algorithms for.  
I have never seen any published.   The information in the phis really
helps you develop clean algorithms.  

I would love to see the rtl back end use phis and renaming rather than
fuds/birthpoints.  The thing is that for phi functions to really be
profitable, you need to have a large number passes in a row that are ssa
clean.  So my plan was to basically start small and just use
fud/birthpoints to control the space/time of the existing suite of passes. 

Kenny








Re: birthpoints in rtl.

2008-03-04 Thread Richard Sandiford
"Steven Bosscher" <[EMAIL PROTECTED]> writes:
> For the location of the extra instructions, I would *not* keep them on
> the side.  If you have something special going on, my motto is: "Make
> it explicit".

Going back to something discussed upthread: would you expect to use this
for hard regs as well as pseudos?  No-op moves aren't necessarily supported
for all hard registers.  E.g. MIPS doesn't have patterns for LO <- LO,
even though LO is a normal non-fixed register.  You can also have hard
registers that only appear in fixed patterns, such as condition code REGs.

If we went for an explicit move, I assume we would either have to
(a) discount hard regs that can't be moved, (b) force backends to
allow all no-op moves or (c) circumvent the backend somehow.
(Jan suggested a USE for (c), but I assume we'd want some sort
of definition too.)

Kenny said that pseudos-only was better than nothing, and I can't
disagree with that.  But one of the nice things about the on-the-side
idea is that you have none of these problems.  There should be nothing
special about hard regs.

I take your point about not wanting something special going on behind
the scenes.  But these insns seem pretty special in their own right,
especially if we go for (a) or (c).  Even if we go for (b), wouldn't
optimisers need to know that they shouldn't just delete the moves?

Richard


Re: birthpoints in rtl.

2008-03-04 Thread Kenneth Zadeck
Richard Sandiford wrote:
> "Steven Bosscher" <[EMAIL PROTECTED]> writes:
>   
>> For the location of the extra instructions, I would *not* keep them on
>> the side.  If you have something special going on, my motto is: "Make
>> it explicit".
>> 
>
> Going back to something discussed upthread: would you expect to use this
> for hard regs as well as pseudos?  No-op moves aren't necessarily supported
> for all hard registers.  E.g. MIPS doesn't have patterns for LO <- LO,
> even though LO is a normal non-fixed register.  You can also have hard
> registers that only appear in fixed patterns, such as condition code REGs.
>
> If we went for an explicit move, I assume we would either have to
> (a) discount hard regs that can't be moved, (b) force backends to
> allow all no-op moves or (c) circumvent the backend somehow.
> (Jan suggested a USE for (c), but I assume we'd want some sort
> of definition too.)
>
> Kenny said that pseudos-only was better than nothing, and I can't
> disagree with that.  But one of the nice things about the on-the-side
> idea is that you have none of these problems.  There should be nothing
> special about hard regs.
>
> I take your point about not wanting something special going on behind
> the scenes.  But these insns seem pretty special in their own right,
> especially if we go for (a) or (c).  Even if we go for (b), wouldn't
> optimisers need to know that they shouldn't just delete the moves?
>
> Richard
>   
This is the first concrete argument that i have seen for keeping them on
the side.  The rest of the discussions (including my own) have been
mostly beauty as opposed to truth.

>From my point of view, this is a killer argument that if we want to
build fuds/birthpoints for all regs, the info must be on the side.

I want to build the info for everything because i want to ditch the
reaching defs way of building chains in favor of building them using the
existing ssa technology.   I do not want to have to call both techniques
to build the chains.

kenny


Re: birthpoints in rtl.

2008-03-04 Thread Diego Novillo

On 3/4/08 2:38 PM, Kenneth Zadeck wrote:

Steven Bosscher wrote:

On Tue, Mar 4, 2008 at 7:58 PM, Diego Novillo <[EMAIL PROTECTED]> wrote:
  

 Both PHIs and birthpoints are merely factoring devices that let you cut
 down the number of UD links.  They don't need to be part of the IL, much
 like none of the DF objects are part of the RTL IL.


Maybe they don't need to be, but it may be useful to have them anyway.

  

 > What about keeping things up-to-date after applying some
 > transformations? It is already hard to keep UD/DU chains up-to-date
 > now (I don't think any pass successfully does so right now). This
 > should be a lot easier if you fully factorize your UD chains, right?

 In theory, yes.  Code for keeping these things up-to-date already exists
 in GIMPLE SSA.


That code is IMHO just awfully ugly. And slow too, last I checked.  I
don't know if  this is still true, but update_ssa used to walk a huge
part of the dominator tree even for seemingly trivial updates. We
should not want that on RTL.  I don't think we should allow
transformations on RTL that are too hard to manually update the FUD
chains somehow.

Gr.
Steven
  

There are many differences between fud/birthpoints and real ssa:

1) In ssa, the operands of the phis and the renaming contain
information.   The operands are paired with the cfg edges that the
values come in on.   In fud/birthpoints there is no such pairing or
renaming.


Yes, there is in FUD chains.  They keep PHI nodes with arguments paired 
to their edges.  Bithpoints do not keep this pairing.




2) There are two "kinds" of ssa algorithms that are used in gcc.   There
are dirty ssa algorithms and "clean" algorithms.  Dirty algorithms do
not keep ssa up to date as you make the transformations.  Clean ones
do.


Popular demand.

Passes are encouraged to keep SSA up-to-date themselves, but we also 
have mechanisms for: (1) an API to register SSA updates in a sub-graph, 
(2) a way of introducing new symbols and have them put in SSA form. 
When FUD chains are altered, they can be renamed from scratch by putting 
their symbols in the to-rename list, or they can be kept manually by 
each pass.  The latter is usually faster.



  I have never understood why it was necessary for gcc to use so many

dirty ssa algorithms.  When NaturalBridge built our compiler, we were
able to use almost exclusively clean algorithms.   Tricks like loop
closed ssa form, help, but in general this is a matter of care on the
algorithmic design side and implementation.   We never had something
like rebuild ssa at naturalbridge.  


I do not know how much this changes with memory ssa.


Memory SSA *is* FUD chains.


I assume clean algorithms are harder here, but I have no experience with it.


Not all that harder, though it's generally easier to just mark the 
affected symbols for renaming.  That makes things slower, though.  The 
updater does not have to put the whole program in SSA form again, but it 
 does have to traverse the whole CFG looking for defs/uses of the 
affected symbols.




I would love to see the rtl back end use phis and renaming rather than
fuds/birthpoints.  The thing is that for phi functions to really be
profitable, you need to have a large number passes in a row that are ssa
clean.  So my plan was to basically start small and just use
fud/birthpoints to control the space/time of the existing suite of passes. 


Makes sense.


Diego.


Re: birthpoints in rtl.

2008-03-04 Thread Jan Hubicka
Hi,
> 
> 1) In ssa, the operands of the phis and the renaming contain
> information.   The operands are paired with the cfg edges that the
> values come in on.   In fud/birthpoints there is no such pairing or
> renaming.   For some problems, like conditional constant, this pairing
> and renaming is what makes the algorithm work.  You actually do not get
> the same answer (you get an inferior but still correct answer) if you do
> not use the pairing and renaming.

I must be quite lost here.  In the non-rewriting SSA (or what I think
FUD chain is) is in my view essentially just alternative representatin
of SSA program.  Instead of having SSA_NAMES and PHI nodes in your IL
directly, they sit on-side datastructure. They hold same information:
version numbers and PHI nodes associated with edges of CFG. For
optimization passes they are however 100% equivalent, just you look at
different places in memory that should be more or less hidden in
abstraction.

Surely with this representation all the SSA analysis algorithms will
work, since what you see is SSA form.  The difference is that you can't
simply use particular SSA name at any place in a program without adding
code to copy the value to register at a place it is defined to be sure
that the original location is not overwritten. 

This is relatively little extra hassle compared to rewritten SSA form
and in the case of conditional constant propagation you don't need to
worry even about that. Not too different on discussions wheteher you
should have on-side CFG and duplicate the info by goto statements or CFG
as part of the IL.

Given that RTL deals with architectural details like partial writes or
hard registers, it seems to make sense to actually target to FUD (or
non-rewritting SSA) rather than trying to adjust RTL to allow SSA in
some form on all those constructs explicitly. Or at least it don't seem
significandly inferrior to me and a lot easier to accomplish.

Honza


Re: birthpoints in rtl.

2008-03-04 Thread Steven Bosscher
On Tue, Mar 4, 2008 at 8:47 PM, Richard Sandiford
<[EMAIL PROTECTED]> wrote:
> "Steven Bosscher" <[EMAIL PROTECTED]> writes:
>  Going back to something discussed upthread: would you expect to use this
>  for hard regs as well as pseudos?  No-op moves aren't necessarily supported
>  for all hard registers.  E.g. MIPS doesn't have patterns for LO <- LO,
>  even though LO is a normal non-fixed register.  You can also have hard
>  registers that only appear in fixed patterns, such as condition code REGs.

Yes, for hard registers you can't use this. Another example is the
loop counter register on ia64, or the flags register on i386.

You should have a look at the history of RTL SSA for hard registers
(http://gcc.gnu.org/ml/gcc-patches/2000-07/msg01285.html and thread).
They used to put selected hard regs into SSA form. Lessons learned:
don't do that.  I think the same applies to FUD chains for hard
registers.


>  Kenny said that pseudos-only was better than nothing, and I can't
>  disagree with that.  But one of the nice things about the on-the-side
>  idea is that you have none of these problems.  There should be nothing
>  special about hard regs.

Uh, hard registers *are* special.  And, also quite important, what
would you *do* with FUD chains for hard registers?  We don't optimize
many things with hard registers, usually just because it's harder to
do than for pseudos.  I don't think FUD chains would change that.

>  I take your point about not wanting something special going on behind
>  the scenes.  But these insns seem pretty special in their own right,
>  especially if we go for (a) or (c).  Even if we go for (b), wouldn't
>  optimisers need to know that they shouldn't just delete the moves?

Old RTL SSA did (a).  It didn't work.  Neither (b) nor (c) seem viable
ideas to me...

Gr.
Steven


Re: birthpoints in rtl.

2008-03-04 Thread Kenneth Zadeck
Jan Hubicka wrote:
> Hi,
>   
>> 1) In ssa, the operands of the phis and the renaming contain
>> information.   The operands are paired with the cfg edges that the
>> values come in on.   In fud/birthpoints there is no such pairing or
>> renaming.   For some problems, like conditional constant, this pairing
>> and renaming is what makes the algorithm work.  You actually do not get
>> the same answer (you get an inferior but still correct answer) if you do
>> not use the pairing and renaming.
>> 
>
> I must be quite lost here.  In the non-rewriting SSA (or what I think
> FUD chain is) is in my view essentially just alternative representatin
> of SSA program.  Instead of having SSA_NAMES and PHI nodes in your IL
> directly, they sit on-side datastructure. They hold same information:
> version numbers and PHI nodes associated with edges of CFG. For
> optimization passes they are however 100% equivalent, just you look at
> different places in memory that should be more or less hidden in
> abstraction.
>
> Surely with this representation all the SSA analysis algorithms will
> work, since what you see is SSA form.  The difference is that you can't
> simply use particular SSA name at any place in a program without adding
> code to copy the value to register at a place it is defined to be sure
> that the original location is not overwritten. 
>
> This is relatively little extra hassle compared to rewritten SSA form
> and in the case of conditional constant propagation you don't need to
> worry even about that. Not too different on discussions wheteher you
> should have on-side CFG and duplicate the info by goto statements or CFG
> as part of the IL.
>
> Given that RTL deals with architectural details like partial writes or
> hard registers, it seems to make sense to actually target to FUD (or
> non-rewritting SSA) rather than trying to adjust RTL to allow SSA in
> some form on all those constructs explicitly. Or at least it don't seem
> significandly inferrior to me and a lot easier to accomplish.
>
> Honza
>   
I think that at this point, i have been convinced to:

1) use fud's rather than birthpoints because these do keep a slot for
the value along each in edge.
2) keep the info on the side (see rsandifors diverging thread).

I am not there on keeping extra names on the side.   The advantage of
the extra names is that it gives you extra freedom.   the disadvantage
is that either the transformations are more expensive or getting out of
the renamed form is expensive.  

Again, if we have a suite of contiguous converted passes in a row, i
could be swayed in the renaming on the side direction, especially if
they butted up against the rtl generation step and avoided the out of
ssa for the tree ssa.  But that is a long time in the future, and i do
not see the short term benefits.

kenny


Re: Possible GCC 4.3 driver regression caused by your patch

2008-03-04 Thread Greg Schafer
On Mon, Mar 03, 2008 at 08:11:30AM -0500, Hans-Peter Nilsson wrote:
> On Sun, 2 Mar 2008, Greg Schafer wrote:
> > Hi Carlos and Mark,
> >
> > Your "Relocated compiler should not look in $prefix" patch here:
> >
> > http://gcc.gnu.org/ml/gcc/2006-10/msg00280.html
> >
> > appears to have caused a regression in my GCC 4.3 testing.
> 
> So *now* I know why my cross-test setup to (non-sysrooted)
> cris-axis-linux-gnu have trouble finding startfiles and
> pre-installed include files!  Thanks!  It seems Carlos' fix for
> the testsuite, has some flaw I'll look into.  At the very least,
> cutnpasting commands from the dejagnu .log files don't work;
> there's some environment variable (more than just
> GCC_EXEC_PREFIX, AFAICT).  And some testsuites (forgot, maybe it
> was libgomp?) need to be adjusted too.

On a related note, this patch has also caused a testsuite regression for me
as evidenced by:

WARNING: Could not compile g++.dg/compat/struct-layout-1 generator
WARNING: Could not compile gcc.dg/compat/struct-layout-1 generator

My context is building up a new system inside a chroot whereby I'm
configuring GCC with --prefix=/usr but the "host" GCC is in some other
prefix. The patch tries to fix the testsuite infrastructure by adding
"set GCC_EXEC_PREFIX \"$(libdir)/gcc/\"" to site.exp. But in my scenario,
this results in:

gcc: error trying to exec 'cc1': execvp: No such file or directory
gcc: error trying to exec 'cc1': execvp: No such file or directory
gcc: error trying to exec 'cc1': execvp: No such file or directory

when trying to build the generator programs. Ughh..

This patch has caused regressions for me and others. There must be a way to
keep relocated compilers happy and ALSO not break existing setups that have
been working for many years.. I'll file a PR.

Thanks
Greg


Re: atomic accesses

2008-03-04 Thread Segher Boessenkool
The Linux kernel, and probably some user-space applications and 
libraries

as well, depend on GCC guaranteeing (a variant of) the following:
"any access to a naturally aligned scalar object in memory
that is not a bit-field will be performed by a single machine
instruction whenever possible"
and it seems the current compiler actually does work like this.

Seems a pity to have the bit-field exception here, why is it there?

Bit-fields will generally require a read-modify-write instruction,
and I don't think we actually guarantee to generate one right now.

Well if they do require more than one instruction, the rule has
no effect ("whenever possible"). If they can be done in one
instruction  (as on the x86), then why not require this, why
make a special case?

Because current GCC doesn't work like this AFAIK.  I'm aiming for
a documentation-only change here, we can always extend it later.


Fair enough, we don't want to document something we don't do!

Does this rule extend to the use of floating-point instructions
to guarantee atomic access to 64-bit long_long_integer, as
written it does!


Good point.  Suggestions for better wording?  How does

"any access to a naturally aligned scalar object in memory
that is not a bit-field and fits in a general purpose integer
machine register, will be performed by a single machine
instruction whenever possible"

or

"any access to a naturally aligned scalar object in memory
that is not a bit-field and not bigger than a long int,
will be performed by a single machine instruction whenever
possible"

sound?


Segher



Re: birthpoints in rtl.

2008-03-04 Thread Richard Sandiford
"Steven Bosscher" <[EMAIL PROTECTED]> writes:
> On Tue, Mar 4, 2008 at 8:47 PM, Richard Sandiford
> <[EMAIL PROTECTED]> wrote:
>> "Steven Bosscher" <[EMAIL PROTECTED]> writes:
>>  Going back to something discussed upthread: would you expect to use this
>>  for hard regs as well as pseudos?  No-op moves aren't necessarily supported
>>  for all hard registers.  E.g. MIPS doesn't have patterns for LO <- LO,
>>  even though LO is a normal non-fixed register.  You can also have hard
>>  registers that only appear in fixed patterns, such as condition code REGs.
>
> Yes, for hard registers you can't use this. Another example is the
> loop counter register on ia64, or the flags register on i386.
>
> You should have a look at the history of RTL SSA for hard registers
> (http://gcc.gnu.org/ml/gcc-patches/2000-07/msg01285.html and thread).
> They used to put selected hard regs into SSA form. Lessons learned:
> don't do that.  I think the same applies to FUD chains for hard
> registers.

Yeah, I remember the RTL-SSA stuff.  And for avoidance of doubt,
I wasn't advocating explicit hard-reg moves.  I was trying to figure
out whether you were.  Clearly you weren't. ;)

>>  Kenny said that pseudos-only was better than nothing, and I can't
>>  disagree with that.  But one of the nice things about the on-the-side
>>  idea is that you have none of these problems.  There should be nothing
>>  special about hard regs.
>
> Uh, hard registers *are* special.  And, also quite important, what
> would you *do* with FUD chains for hard registers?  We don't optimize
> many things with hard registers, usually just because it's harder to
> do than for pseudos.  I don't think FUD chains would change that.

I don't see why hard registers are special as far as FUD chains go.
We have DU chains for hard regs, so why not FUDs too?

Richard


Re: plugin includes for MELT

2008-03-04 Thread Ralf Wildenhues
* Basile STARYNKEVITCH wrote on Thu, Feb 28, 2008 at 06:56:35PM CET:
> Ralf Wildenhues wrote:
>> * Basile STARYNKEVITCH wrote on Thu, Feb 28, 2008 at 05:39:47PM CET:

 run-basilys.d: run-basilys.h \
$(CONFIG_H) $(SYSTEM_H) $(TIMEVAR_H) $(TM_H)  $(TREE_H)  $(GGC_H)  \
tree-pass.h basilys.h gt-basilys.h
   $(CC) -MT run-basilys-deps -MMD  $(ALL_CFLAGS) $(ALL_CPPFLAGS) $<
>>
>> The build compiler may not be gcc and may not understand -MT and -MMD.
>
> Yes, I know. But how can I avoid that?

Use depcomp.  Rather than explaining how to do that, it's easier if you
wait until Tom puts the depcomp support code in the tree, then use it
the same way the other stuff will.

>> Wasn't there a proposal to use depcomp in gcc a while ago?
>
> What is depcomp exactly?

A compiler wrapper that helps to do dependency computation as a
side-effect of compilation.  IIRC it first appeared in Automake
(though its authors are also GCC developers, I'm not sure who
initiated it).

mkdir -p $(melt_build_include_dir); \
>>
>> mkdir -p is not portable, use $(mkinstalldirs).
>
> Is $(mkinstalldirs) usable for non-installed directories?

Yes.

> (in other  words, it does not do any chown or chmod?

Exactly.

Cheers,
Ralf


Re: atomic accesses

2008-03-04 Thread Paul Koning
> "Segher" == Segher Boessenkool <[EMAIL PROTECTED]> writes:

 Segher> Good point.  Suggestions for better wording?  How does

 Segher> "any access to a naturally aligned scalar object in memory
 Segher> that is not a bit-field and fits in a general purpose integer
 Segher> machine register, will be performed by a single machine
 Segher> instruction whenever possible"

 Segher> or

 Segher> "any access to a naturally aligned scalar object in memory
 Segher> that is not a bit-field and not bigger than a long int, will
 Segher> be performed by a single machine instruction whenever
 Segher> possible"

 Segher> sound?

As I said before, I think any words of this form SHOULD NOT be added.
All it does is add words to the documentation that provide NO
guarantee of anything -- but in a way that will confuse those who
don't read it carefully enough into thinking that they DID get some
sort of guarantee.

In other words, a statement like that has clear negative value.

   paul



Re: birthpoints in rtl.

2008-03-04 Thread Steven Bosscher
On Tue, Mar 4, 2008 at 9:46 PM, Richard Sandiford
<[EMAIL PROTECTED]> wrote:
>  I don't see why hard registers are special as far as FUD chains go.
>  We have DU chains for hard regs, so why not FUDs too?

We have them, but does anyone use them?  Does anyone actually even
compute them?  (Apparently fwprop does.)

It all depends on what you want to do with this.  If you want to make
it easier to do the existing optimizations over a factored UD/DU web,
then ignoring hard registers would have made sense: Easier, less
expensive to compute; easier to maintain; easier to rewrite into SSA
(pseudos are not shared); etc.

But Kenny has already said he wants to have a full replacement for
reaching defs, so the point's become moot :-)

Gr.
Steven


Re: birthpoints in rtl.

2008-03-04 Thread Jan Hubicka
> I think that at this point, i have been convinced to:
> 
> 1) use fud's rather than birthpoints because these do keep a slot for
> the value along each in edge.
> 2) keep the info on the side (see rsandifors diverging thread).
> 
> I am not there on keeping extra names on the side.   The advantage of
> the extra names is that it gives you extra freedom.   the disadvantage
> is that either the transformations are more expensive or getting out of
> the renamed form is expensive.  

The names are equivalent to UD pointers:  Either you can have version
names or just coinsider the destination of UD pointer to be the
destination.  Or am I still missing a point?
> 
> Again, if we have a suite of contiguous converted passes in a row, i
> could be swayed in the renaming on the side direction, especially if
> they butted up against the rtl generation step and avoided the out of
> ssa for the tree ssa.  But that is a long time in the future, and i do

Generating RTL and building FUD based on existing tree-SSA is doable. I
am not sure how practical however.  The value can be to have a means of
transfering fine grained info to RTL level.  Definitly not step we want
to make tomorrow ;)

I guess we can just stay rebuilding FUDs as we rebuild DU/UD on RTL now.
It should not be any more expensive, I hope (definitly it should not
have the extreme side cases as DU/UD has, I am not sure how average
construction time for FUDs/SSA compare to DU/UD construction.
Algorithms seems comparably complex to my eyes.

Gradually we update passes to maintain the info.  It is all bit slippery
on RTL level since most transformation go through RTL emit machinery
that is allowed to introduce fancy things, clobber registers, add
temporaries and do all that stuff.

I believe that FUD on hard regs is doable and practical: I don't see how
the rewriting SSA problems hit by RTL-SSA project map here and overall I
believe the basic disapointment lesson from RTL-SSA project was not
SUBREGs/STRICT_LOW_PARTs and other isues, but the fact that RTL is that
hard to modify: everything you do go through target validation machinery
or expansion and can behave irregularly that does not play well with
standard optimization algorithms plus there are ugly things like libcall
or other notes that was a lot more important in GCC of RTL-SSA project
time.

So at the end adding sane analysis framework didn't let you to write
easy high level transformations RTL-SSA was originally intended for.
With Gimple optimizer in place, we are however not targetting this kind
of stuff on RTL anymore. We want sane dataflow info for guiding basic
stuff (DCE/CCP/register allocation/GCSE).

Honza
> not see the short term benefits.
> 
> kenny


Re: birthpoints in rtl.

2008-03-04 Thread Jan Hubicka
> > I think that at this point, i have been convinced to:
> > 
> > 1) use fud's rather than birthpoints because these do keep a slot for
> > the value along each in edge.
> > 2) keep the info on the side (see rsandifors diverging thread).
> > 
> > I am not there on keeping extra names on the side.   The advantage of
> > the extra names is that it gives you extra freedom.   the disadvantage
> > is that either the transformations are more expensive or getting out of
> > the renamed form is expensive.  
> 
> The names are equivalent to UD pointers:  Either you can have version
> names or just coinsider the destination of UD pointer to be the
> destination.  Or am I still missing a point?

... well perhaps in better Czenglish ...
Either you can have version names and build UD pointers by knowing
definition points of the version or you can consider UD pointer being
the version name.

Honza


Re: birthpoints in rtl.

2008-03-04 Thread Diego Novillo

On 3/4/08 4:06 PM, Jan Hubicka wrote:


The names are equivalent to UD pointers:  Either you can have version
names or just coinsider the destination of UD pointer to be the
destination.  Or am I still missing a point?


Nope, that's exactly right.  Versioned names are useful for some things 
(mostly keeping attributes/values/etc in arrays indexed by name 
version), but straight pointers are also doable.



I believe that FUD on hard regs is doable and practical: I don't see how
the rewriting SSA problems hit by RTL-SSA project map here and overall I
believe the basic disapointment lesson from RTL-SSA project was not
SUBREGs/STRICT_LOW_PARTs and other isues, but the fact that RTL is that
hard to modify: everything you do go through target validation machinery
or expansion and can behave irregularly that does not play well with
standard optimization algorithms plus there are ugly things like libcall
or other notes that was a lot more important in GCC of RTL-SSA project
time.


Yes, we should not try to do a rewriting SSA for the time being.  In 
this we are all in agreement.  Nobody is advocating a rewriting SSA form 
on RTL at the moment.  Maybe in the future, but for now building FUD 
chains on the DF framework is Not Hard.


DF already has support for rebuilding the UD chains.  So, rebuilding 
FUDs will probably be straightforward and it may be a bit simpler.  With 
FUD chains you are trading the complexity of computing dominance 
frontiers and (maybe) PHI pruning with the setting of more UD chains.



Diego.


Re: atomic accesses

2008-03-04 Thread Segher Boessenkool

As I said before, I think any words of this form SHOULD NOT be added.
All it does is add words to the documentation that provide NO
guarantee of anything -- but in a way that will confuse those who
don't read it carefully enough into thinking that they DID get some
sort of guarantee.


The idea is to _do_ provide that guarantee.  If the GCC code does not 
agree

with the GCC documentation, the code has a bug ;-)


In other words, a statement like that has clear negative value.


I disagree.  People are relying on this undocumented GCC behaviour 
already,

and when things break, chaos ensues.  If we change this to be documented
behaviour, at least it is clear where the problem lies (namely, with the
compiler), and things can be fixed easily.

The two big questions are:

1) Do we *want* to guarantee any behaviour in this area?

2) Exactly *what* behaviour?


Segher



Re: atomic accesses

2008-03-04 Thread Andrew Haley

Segher Boessenkool wrote:

As I said before, I think any words of this form SHOULD NOT be added.
All it does is add words to the documentation that provide NO
guarantee of anything -- but in a way that will confuse those who
don't read it carefully enough into thinking that they DID get some
sort of guarantee.


The idea is to _do_ provide that guarantee.  If the GCC code does not agree
with the GCC documentation, the code has a bug ;-)


In other words, a statement like that has clear negative value.


I disagree.  People are relying on this undocumented GCC behaviour already,
and when things break, chaos ensues.  If we change this to be documented
behaviour, at least it is clear where the problem lies (namely, with the
compiler), and things can be fixed easily.

The two big questions are:

1) Do we *want* to guarantee any behaviour in this area?

2) Exactly *what* behaviour?


This would ba a gcc extension.  History does not favour such extensions:
we've been unable to define them well enough, for one thing.

Andrew.


Re: atomic accesses

2008-03-04 Thread Paul Koning
> "Segher" == Segher Boessenkool <[EMAIL PROTECTED]> writes:

 >> As I said before, I think any words of this form SHOULD NOT be
 >> added.  All it does is add words to the documentation that provide
 >> NO guarantee of anything -- but in a way that will confuse those
 >> who don't read it carefully enough into thinking that they DID get
 >> some sort of guarantee.

 Segher> The idea is to _do_ provide that guarantee.  If the GCC code
 Segher> does not agree with the GCC documentation, the code has a bug
 Segher> ;-)

 >> In other words, a statement like that has clear negative value.

 Segher> I disagree.  People are relying on this undocumented GCC
 Segher> behaviour already, and when things break, chaos ensues.  If
 Segher> we change this to be documented behaviour, at least it is
 Segher> clear where the problem lies (namely, with the compiler), and
 Segher> things can be fixed easily.

 Segher> The two big questions are:

 Segher> 1) Do we *want* to guarantee any behaviour in this area?

 Segher> 2) Exactly *what* behaviour?

Yes, that's the question.

First of all, the text you supplied does not create any guarantee at
all.  It says that "whenever possible" GCC will do x.  Translation:
for any given bit of source, target, switches, etc., that means GCC
may do x in that case -- and it also means it may NOT do x in that
case.  Either outcome is legal by the text you proposed.  There is no
bug in GCC, whether it does x or (not x).

So you're not adding a guarantee.  But even though it isn't a
guarantee, it may cause some people to think it is one.  In fact, I'm
tempted to say it's doing that to you.

Now, suppose we take out "whenever possible" and replace it by
"always".  Then it IS a guarantee, and if GCC generates multiple
instructions, it's a GCC bug.  (If we propose to follow this path, do
we have any idea how many instances of that bug exist right now in the
current code generators?)

But what does such a statement guarantee?  Atomic access?

What exactly does "atomic access" mean?  It might mean, as one of the
earlier notes said, that in a single writer multiple reader setting
you will only ever see the "before" or the "after" states but not the
one in between.

It's probably true for most architectures (perhaps even for all that
GCC supports) that this limited interpretation of "atomic" is
satisfied when the load or store is a single instruction, aligned, the
right size, etc.

Another possible interpretation of "atomic" is "if there are multiple
writers, one write won't interfere with the other". For example, if
one writer updates X, and another updates Y, two aligned variables
adjacent in memory, the final outcome has the new X and the new Y.

That interpretation in general is NOT satisfied simply by using a
single instruction for store.  Maybe it is on x86 -- but not
necessarily so on RISC machines.

So, even with the hard requirement for single instruction load/store,
it isn't clear what conclusion a programmer is supposed to draw from
the statement under consideration.

The discussion is about atomicity.  Talking about single instructions
is seriously misleading, because there is only a weak connection
between the two.  It DOES NOT matter to a programmer whether a C
assignment generates one instruction or twenty; what matters are the
semantics guaranteed for that statement.

If we want to have atomicity properties of plain language C
constructs, let's have a statement of exactly what atomicity
properties are to be guaranteed. NOT in terms of generated code, but
in terms of abstract semantics.  It may well be that anythe most
desirable atomicity semantics are too expensive -- you'd end up with
constraints similar to "volatile", or even more so.  But suppose we
could have a particular guarantee.  Then we can see if what "people"
are relying on is in fact addressed by that guarantee, or if they were
expecting a stronger guarantee that they are simply NOT going to get
from GCC (not unless they invoke specific atomic_foo builtins).  If
the former, then GCC has cured a bug in the original application
(perhaps at the expense of work in GCC); if the latter, then the
application bug is still there.

  paul




Re: atomic accesses

2008-03-04 Thread Paul Brook
> AFAIK the only reason we don't break this rule is that doing so would
> be grossly inefficient; there's nothing to stop any gcc back-end with
> (say) seriously slow DImode writes from using two SImode writes instead.

I'm fairly sure ARM already breaks this "rule".

Currently it probably only effects postincrement addressing modes.  However 
there is definite scope for splitting loads/stores (even SI->2*HI) when 
optimizing for size.

Paul


Re: atomic accesses

2008-03-04 Thread Ross Ridge
Segher Boessenkool writes:
>... People are relying on this undocumented GCC behaviour already,
>and when things break, chaos ensues.

GCC has introduced many changes over the years that have broken many
programs that have relied on undocumented or unspecified behaviour.
You won't find much sympathy for who people assume that GCC must behave
in some way where there is no requirement for it to do so. 

>If we change this to be documented behaviour, at least it is clear
>where the problem lies (namely, with the compiler), and things can be
>fixed easily.

I don't think you'll find any support for imposing a requirement on GCC
that would always require it to use an "atomic" instruction when there
is alternative instruction or sequence of instructions that would be
faster and/or shorter.  I think your best bet a long these lines would
be adding __sync_fetch() and __sync_store() builtins, but doing so would
be more difficult than a simple documentation change.

Ross Ridge



Re: [RFC] GCC caret diagnostics

2008-03-04 Thread Ian Lance Taylor
"Manuel López-Ibáñez" <[EMAIL PROTECTED]> writes:

> Here is a patch that give us caret diagnostics in C/C++. There a lot
> of things that can be improved but because I wanted to get some
> feedback with my current approach.
> 
> Basically, I store a pointer linebuf in the line_map structure to a
> character in the input file buffer. The character corresponds to the
> first character in the line corresponding to TO_LINE in the line_map
> structure. The downside of this is that the buffer cannot be freed
> anymore. I am not sure whether this is better than storing a duplicate
> of the line as gfortran does. The third approach would be to store an
> offset and when generating diagnostics, reopen the file, fseek to the
> offset and print that line.
> 
> One line_map can contain information about several lines, so we still
> need to find the correct position for a line within linebuf. That is
> what the hack in expand_location is for. It would be nice to have a
> way to point directly to the beginning of each line: multiple pointers
> per line_map?

I like it.  I think the general approach is fine, but I think you
should free all the information when the frontend is complete--e.g.,
when it calls cgraph_finalize_compilation_unit.  That is, only give
caret warnings for diagnostics from the frontend.

Ian


Re: Google Summer of Code 2008

2008-03-04 Thread Ian Lance Taylor
"Doug Gregor" <[EMAIL PROTECTED]> writes:

> I see that it is time to submit applications to be a mentor
> organization for the Google Summer of Code. I've updated the GSoC wiki
> page at:
> 
>   http://gcc.gnu.org/wiki/SummerOfCode
> 
> with a class of projects I'm interested in; others should do the same.

Thanks.  I agree with Doug: please update the wiki page.


> Who is responsible for actually submitting GCC's application to GSoC,
> and who has done so in the past?

I have done so in the past, and indeed I have already submitted an
application for gcc for this year.

Ian


Re: [4.3/4.4]: PATCH: PR target/35453: nmmintrin.h defines macros SIDD_XXX

2008-03-04 Thread Ian Lance Taylor
"H.J. Lu" <[EMAIL PROTECTED]> writes:

> Here is the patch for both gcc 4.3 and 4.4. OK for 4.3/4.4? Tested on 
> Linux/ia32
> and Linux/ia64 with gcc 4.3/4.4.

> gcc/
> 
> 2008-03-03  H.J. Lu  <[EMAIL PROTECTED]>
> 
>   PR target/35453
>   * config/i386/smmintrin.h (SIDD_XXX): Renamed to ...
>   (_SIDD_XXX): This.
> 
> gcc/testsuite/
> 
> 2008-03-03  H.J. Lu  <[EMAIL PROTECTED]>
> 
>   PR target/35453
>   * gcc.target/i386/sse4_2-pcmpestri-1.c: Replace SIDD_XXX with
>   _SIDD_XXX.
>   * gcc.target/i386/sse4_2-pcmpestri-2.c: Likewise.
>   * gcc.target/i386/sse4_2-pcmpestrm-1.c: Likewise.
>   * gcc.target/i386/sse4_2-pcmpestrm-2.c: Likewise.
>   * gcc.target/i386/sse4_2-pcmpistri-1.c: Likewise.
>   * gcc.target/i386/sse4_2-pcmpistri-2.c: Likewise.
>   * gcc.target/i386/sse4_2-pcmpistrm-1.c: Likewise.
>   * gcc.target/i386/sse4_2-pcmpistrm-2.c: Likewise.
>   * gcc.target/i386/sse4_2-pcmpstr.h: Likewise.

This is OK for mainline.  I will defer to an RM for 4.3, though my
recommendation is that it should go into 4.3 if possible.

Thanks.

Ian


Re: [4.3/4.4]: PATCH: PR target/35453: nmmintrin.h defines macros SIDD_XXX

2008-03-04 Thread Ian Lance Taylor
Ian Lance Taylor <[EMAIL PROTECTED]> writes:

> This is OK for mainline.  I will defer to an RM for 4.3, though my
> recommendation is that it should go into 4.3 if possible.

Sorry, the thread broke, and I didn't see that this had already been
approved.

Ian


Help with GCC on Cygwin

2008-03-04 Thread Balaji V. Iyer
Hello Everyone,
I am trying to do some development on the C Compiler in Cygwin and I
am doing the following to build it:
 
$ ../gcc-4.0.2/gcc/configure
--prefix=/home/Balaji/Software_Tools/install --enable-languages="c"
 
The problem i am getting is this:
 
$ make all install
TARGET_CPU_DEFAULT="" \
HEADERS="auto-host.h ansidecl.h config/i386/xm-cygwin.h"
DEFINES="" \
/bin/sh ../gcc-4.0.2/gcc/mkconfig.sh config.h
TARGET_CPU_DEFAULT="" \
HEADERS="config/i386/i386.h config/i386/unix.h config/i386/bsd.h
config/
i386/gas.h config/dbxcoff.h config/i386/cygming.h config/i386/cygwin.h
defaults.
h" DEFINES="" \
/bin/sh ../gcc-4.0.2/gcc/mkconfig.sh tm.h
TARGET_CPU_DEFAULT="" \
HEADERS="auto-host.h ansidecl.h config/i386/xm-cygwin.h"
DEFINES="" \
/bin/sh ../gcc-4.0.2/gcc/mkconfig.sh bconfig.h
/home/Balaji/Software_Tools/gcc-4.0.2/compile gcc
-c   -g -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes
-Wmissing-protot
ypes -Wold-style-definition-DHAVE_CONFIG_H -DGENERATOR_FILE -I.
-Ibuild -I..
/gcc-4.0.2/gcc -I../gcc-4.0.2/gcc/build -I../gcc-4.0.2/gcc/../include
-I../gcc-4
.0.2/gcc/../libcpp/include -o build/genmodes.o
../gcc-4.0.2/gcc/genmodes.c
/home/Balaji/Software_Tools/gcc-4.0.2/compile gcc
-c   -g -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes
-Wmissing-protot
ypes -Wold-style-definition-DHAVE_CONFIG_H -DGENERATOR_FILE -I.
-Ibuild -I..
/gcc-4.0.2/gcc -I../gcc-4.0.2/gcc/build -I../gcc-4.0.2/gcc/../include
-I../gcc-4
.0.2/gcc/../libcpp/include -o build/errors.o
../gcc-4.0.2/gcc/errors.c
make: *** No rule to make target
`../build-i686-pc-cygwin/libiberty/libiberty.a'
, needed by `build/genmodes.exe'.  Stop.
 
 
I am currently using cygwin on a x86 machine, gcc version 4.0.2 (I have
to use this version...can't use a diferent one), 
 
 
Any help is very highly appreciated!
 
Thanking You,
 
Yours Sincerely,
 
Balaji V. Iyer.
 
 
PS. Here is the output I received right after I ran the configur
command.
 
checking build system type... i686-pc-cygwin

checking host system type... i686-pc-cygwin

checking target system type... i686-pc-cygwin

checking LIBRARY_PATH variable... ok

checking GCC_EXEC_PREFIX variable... ok

checking whether to place generated files in the source directory... no

checking whether a default linker was specified... no

checking whether a default assembler was specified... no

checking for gcc... gcc

checking for C compiler default output file name... a.exe

checking whether the C compiler works... yes

checking whether we are cross compiling... no

checking for suffix of executables... .exe

checking for suffix of object files... o

checking whether we are using the GNU C compiler... yes

checking whether gcc accepts -g... yes

checking for gcc option to accept ANSI C... none needed

checking whether gcc and cc understand -c and -o together... yes

checking how to run the C preprocessor... gcc -E

checking for inline... inline

checking for long long int... yes

checking for __int64... no

checking for egrep... grep -E

checking for ANSI C header files... yes

checking for sys/types.h... yes

checking for sys/stat.h... yes

checking for stdlib.h... yes

checking for string.h... yes

checking for memory.h... yes

checking for strings.h... yes

checking for inttypes.h... yes

checking for stdint.h... yes

checking for unistd.h... yes

checking for void *... yes

checking size of void *... 4

checking for short... yes

checking size of short... 2

checking for int... yes

checking size of int... 4

checking for long... yes

checking size of long... 4

checking for long long... yes

checking size of long long... 8

checking whether gcc accepts -Wno-long-long... yes

checking whether gcc accepts -Wno-variadic-macros... no

checking whether gcc accepts -Wold-style-definition... yes

checking valgrind.h usability... no

checking valgrind.h presence... no

checking for valgrind.h... no

checking whether make sets $(MAKE)... yes

checking for gawk... gawk

checking whether ln -s works... yes

checking whether ln works... yes

checking for ranlib... ranlib

checking for a BSD compatible install... /usr/bin/install -c

checking for cmp's capabilities... gnucompare

checking for mktemp... yes

checking for makeinfo... makeinfo

checking for modern makeinfo... yes

checking for recent Pod::Man... yes

checking for flex... flex

checking for bison... bison

checking for nm... nm

checking for ar... ar

checking for GNU C library... no

checking for ANSI C header files... (cached) yes

checking whether time.h and sys/time.h may both be included... yes

checking whether string.h and strings.h may both be included... yes

checking for sys/wait.h that is POSIX.1 compatible... yes

checking for limits.h... yes

checking for stddef.h... yes

checking for string.h... (cached) yes

checking for strings.h... (cached) yes

checking for stdlib.h... (cached) yes

checking for time.h... yes

checking for iconv.h... yes

checking for fcntl.

Re: Help with GCC on Cygwin

2008-03-04 Thread Ian Lance Taylor
"Balaji V. Iyer" <[EMAIL PROTECTED]> writes:

> I am trying to do some development on the C Compiler in Cygwin and I
> am doing the following to build it:

gcc@gcc.gnu.org is the wrong mailing list.  Please send any further
e-mail to [EMAIL PROTECTED]  Thanks.

> $ ../gcc-4.0.2/gcc/configure

Run ../gcc-4.0.2/configure, not ../gcc-4.0.2/gcc/configure.

Ian


Re: static array with constant size

2008-03-04 Thread Elazar Leibovich
I'm trying to compile the following piece of code:
static const int ln = 10;
static int ar[ln];
I'm getting:
storage size of 'ar' isn't constant
size of variable 'ar' is too large
 Is the code legal? Can you provide me with references to its legality
or a discussion about it? it seems to be compilable with MS cl.exe.

Thanks


RE: Help with GCC on Cygwin

2008-03-04 Thread Balaji V. Iyer
Thank you Ian. I did the modification you mentioned...now I am running
into more problems.

Now it is failing somewhere in libiberty.. here is the exact message (I
just simply typed "make all install") (I get same messae when I just do
"make")

Configuring in fixincludes
configure: loading cache ./config.cache
checking build system type... i686-pc-cygwin
checking host system type... i686-pc-cygwin
checking target system type... i686-pc-cygwin
checking for i686-pc-cygwin-gcc... gcc
checking for C compiler default output file name... a.exe
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables... .exe
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking how to run the C preprocessor... gcc -E
checking for egrep... grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking stddef.h usability... yes
checking stddef.h presence... yes
checking for stddef.h... yes
checking for stdlib.h... (cached) yes
checking for strings.h... (cached) yes
checking for unistd.h... (cached) yes
checking fcntl.h usability... yes
checking fcntl.h presence... yes
checking for fcntl.h... yes
checking sys/file.h usability... yes
checking sys/file.h presence... yes
checking for sys/file.h... yes
checking for sys/stat.h... (cached) yes
checking for clearerr_unlocked... no
checking for feof_unlocked... no
checking for ferror_unlocked... no
checking for fflush_unlocked... no
checking for fgetc_unlocked... no
checking for fgets_unlocked... no
checking for fileno_unlocked... no
checking for fprintf_unlocked... no
checking for fputc_unlocked... no
checking for fputs_unlocked... no
checking for fread_unlocked... no
checking for fwrite_unlocked... no
checking for getchar_unlocked... yes
checking for getc_unlocked... yes
checking for putchar_unlocked... yes
checking for putc_unlocked... yes
checking whether abort is declared... yes
checking whether errno is declared... no
checking whether clearerr_unlocked is declared... no
checking whether feof_unlocked is declared... no
checking whether ferror_unlocked is declared... no
checking whether fflush_unlocked is declared... no
checking whether fgetc_unlocked is declared... no
checking whether fgets_unlocked is declared... no
checking whether fileno_unlocked is declared... no
checking whether fprintf_unlocked is declared... no
checking whether fputc_unlocked is declared... no
checking whether fputs_unlocked is declared... no
checking whether fread_unlocked is declared... no
checking whether fwrite_unlocked is declared... no
checking whether getchar_unlocked is declared... yes
checking whether getc_unlocked is declared... yes
checking whether putchar_unlocked is declared... yes
checking whether putc_unlocked is declared... yes
checking for an ANSI C-conforming const... yes
checking sys/mman.h usability... yes
checking sys/mman.h presence... yes
checking for sys/mman.h... yes
checking for mmap... yes
checking whether read-only mmap of a plain file works... yes
checking whether mmap from /dev/zero works... no
checking for MAP_ANON(YMOUS)... yes
checking whether mmap with MAP_ANON(YMOUS) works... no
checking whether to enable maintainer-specific portions of Makefiles...
no
updating cache ./config.cache
configure: creating ./config.status
config.status: creating Makefile
config.status: creating mkheaders
config.status: creating config.h
Configuring in libiberty
configure: creating cache ./config.cache
checking whether to enable maintainer-specific portions of Makefiles...
no
checking for makeinfo... makeinfo
checking for perl... perl
checking build system type... i686-pc-cygwin
checking host system type... i686-pc-cygwin
checking for i686-pc-cygwin-ar... ar
checking for i686-pc-cygwin-ranlib... ranlib
checking for i686-pc-cygwin-gcc... gcc
checking for C compiler default output file name... a.exe
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables... .exe
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking how to run the C preprocessor... gcc -E
checking whether gcc and cc understand -c and -o together... yes
checking for an ANSI C-conforming const... yes
checking for inline... inline
checking whether byte ordering is bigendian... no
checking for a BSD-compatible install... /usr/bin/install -c
checking for sys/file.h... yes
checking for sys/param.h... yes
checking for limits.h... yes
checking for stdlib.h... ye

Re: static array with constant size

2008-03-04 Thread Andrew Pinski
On 3/4/08, Elazar Leibovich <[EMAIL PROTECTED]> wrote:
> I'm trying to compile the following piece of code:
>  static const int ln = 10;
>  static int ar[ln];
>  I'm getting:
>  storage size of 'ar' isn't constant
>  size of variable 'ar' is too large
>   Is the code legal? Can you provide me with references to its legality
>  or a discussion about it? it seems to be compilable with MS cl.exe.

First, this is the wrong list, [EMAIL PROTECTED] is a better list.
Second this is valid C++98/C++03 but invalid C90/C99.  In C90/C99,
variables are not constant integral expressions while in C++98/C++03,
initialized static constant variables are.

-- Pinski