Fwd: Re: [PATCH][4.3] Deprecate -ftrapv
Somehow this got stuck in the spam filter. - Forwarded message from [EMAIL PROTECTED] - Date: Sat, 01 Mar 2008 09:21:21 -0500 From: Joern Rennecke <[EMAIL PROTECTED]> Reply-To: Joern Rennecke <[EMAIL PROTECTED]> Subject: Re: [PATCH][4.3] Deprecate -ftrapv To: gcc@gcc.gnu.org Cc: [EMAIL PROTECTED], [EMAIL PROTECTED] On Fri, 29 Feb 2008, Robert Dewar wrote: Well presumably one would want to use target dependent stuff for detecting overflow where it exists (sticky overflow bits on power, O flag on PC, trapping add on MIPS etc). In fact, when I wrote the original -ftrapv code, it was for the sole purpose of using the trapping add on mips. On Sat, 1 Mar 2008, Joseph S. Myers wrote: The only targets defining the v insn patterns at present appear to be alpha and pa. Considering the trouble that you get when you try to generate branches in a non-branch expander, we should probably have alternate named patterns to be used in ports to processors that have no conditional trap facility, or where a conditional trap is more expensive than a well predictable conditional branch. We want arithmetic-and-branch-on-overflow patterns for these. One peculiarity of these patterns would be that they would be required to expand into more than one instruction, since the write of the result must not be in the same instruction as the branch due to reload limitations. Thus the overflow condition in CC0 / other flags register / predicate register has to be actually exposed in rtl to show the dependency between arithmetic and branch. We should document this quirk in the description of these named patterns. When the machine independent expander machinery wants to expand a trapping arithmetic operation that has no matching named pattern defined by the port, and there is no conditional trap defined, it can than use the arithmetic-and-branch-on-overflow pattern to branch to an abort call if an overflow occurs. To allow branch inversion to work, we don't need to do anything special if the condition is expressed as a comparison against 0 of a 'integer' flag regsiter or a predicate bit. However, if the condition is in CC0 or a CCmode flags register, we want a way to express the overflow and non-overflow conditions so that reverse_condition or REVERSE_CONDITION can do its work. I see two possibilities here. For simplicity I will describe them here in terms of CC0, although many target ports would actually use a scheduler-exposed flags register with an appropriate CCmode mode. - We could have (overflow CC0 0) and (nooverflow CC0 0), where overflow and nooverflow are two new comparison codes, and the trailing 0 is a dummy argument for the sake of consistency with comparison operators. - We could have (ge CC0 overflow) and (lt CC0 overflow), where overflow is a new one-of-a-kind RTX object. - End forwarded message -
Re: Benchmarks: 7z, bzip2 & gzip.
2008/2/29, J.C. Pizarro <[EMAIL PROTECTED]>: > Here are the results of benchmarks of 3 compressors: 7z, bzip2 and gzip, and > GCCs 3.4.6, 4.1.3-20080225, 4.2.4-20080227, 4.3.0-20080228 & 4.4.0-20080222. Thanks, that's very interesting. I had noticed 4.2 producing 10% larger and 10% slower code for a sample code fragment for ARM but couldn't follow it up. Is there a clause in regressions for "takes longer to compile and produces worse code"? M
Re: GCC 4.3.0 Status Report (2008-03-03)
On Mon, 3 Mar 2008, H.J. Lu wrote: > Hi, > > I'd like to fix > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35453 > > for gcc 4.3. Defines SIDD_XXX in SSE4 header file is a bad idea. SSE 4 > header file > in icc will also be fixed. Works for me. Richard.
Getting GCC to always dllimport vtables on X86?
Hi, Sure hope I've come to the right place... I need to somehow persuade GCC (on x86) to always treat vtables as if they were dllimport'ed. For linking to work on my target platform (a custom X86 OS) it's important that constructors reference vtables indirectly (i.e. through pointers in idata). The other side of this coin is a small hack to ld to allow dllimports to work, not just from importing modules, but from the exporting module as well (i.e. __imp__ symbols for vtables get created automatically once ld detects its unresolved). Is there anyone here who would be willing to show me the way with this? Although I am a proficient C/C++ programmer, I am a GNU noob and the GCC source code scares me... :-) I think I've found the right place in GCC - import_export_vtable in decl2.c - but am at a loss to understand the ld source. Many thanks, -- Reuben Harris
Re: atomic accesses
Segher Boessenkool wrote: The Linux kernel, and probably some user-space applications and libraries as well, depend on GCC guaranteeing (a variant of) the following: "any access to a naturally aligned scalar object in memory that is not a bit-field will be performed by a single machine instruction whenever possible" and it seems the current compiler actually does work like this. Seems a pity to have the bit-field exception here, why is it there? Bit-fields will generally require a read-modify-write instruction, and I don't think we actually guarantee to generate one right now. Well if they do require more than one instruction, the rule has no effect ("whenever possible"). If they can be done in one instruction (as on the x86), then why not require this, why make a special case? Because current GCC doesn't work like this AFAIK. I'm aiming for a documentation-only change here, we can always extend it later. Fair enough, we don't want to document something we don't do! Does this rule extend to the use of floating-point instructions to guarantee atomic access to 64-bit long_long_integer, as written it does! Segher
Information regarding issue with While Loop with O3 optimization
Hello all, I am encountering a strange problem. I have a code Snippet that contains a while loop. The snippet is as follows: While( (expr1) && (expr2) ); Initially the value of both expr1 and expr2 are Set to 1. Next, only the value of expr1 is set to 0 within a SIGINT handler. I compile this program with -O3 optimization. The gcc Version information is as follows: gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52) When I run this program, it goes into a tight loop. Now when I send a SIGINT to this program, Then the program is not exiting. I repeat the same procedure by compiling the same code with -O0 optimization option. Now when I Send the SIGINT signal the program exits Has anyone encountered similar kind of behavior? Is there any issue with GCC 4.1.1 related to O3 optimization Option.? Regards Raghu.
Re: Information regarding issue with While Loop with O3 optimization
Raghukrishna Hegde wrote: Hello all, I am encountering a strange problem. I have a code Snippet that contains a while loop. The snippet is as follows: While( (expr1) && (expr2) ); Initially the value of both expr1 and expr2 are Set to 1. Next, only the value of expr1 is set to 0 within a SIGINT handler. you need to make expr1 volatile for this to work, the optimizer does not have to take care of the possibility of a handler changing a variable otherwise.
Re: Interoperability of Fortran array and C vector?
But the remaining question is: can we support type introperability from Fortran array to C vector? I think this is more a middle-end issue that a Fortran issue, so I'm following there: can the middle-end VIEW_CONVERT_EXPR between and ARRAY_REF of, say, INTEGER_TYPE (which is what the Fortran array is) and a VECTOR_TYPE? I'm CCing the main GCC devel list, since we might have more answers form there. FX -- François-Xavier Coudert http://www.homepages.ucl.ac.uk/~uccafco/
Re: Swing replacements
On Mon, 3 Mar 2008, [EMAIL PROTECTED] wrote: I have a stand-alone, non-Web-based app. that I'd like to distribute as a .exe with some database files, to a layman audience, and I'd like to avoid issues of JRE distribution and compatibility, etc. So I'm hoping someone, somewhere, has written a replacement framework for Java's GUI classes. Can you by any chance point me in such a direction? I haven't used it myself, but SwingWT (http://swingwt.sourceforge.net/) in combination with SWT might be be what you're looking for. See also (http://thisiscool.com/gcc_mingw.htm). BTW, [EMAIL PROTECTED] is a more appropriate list for further discussion, at least concerning GCJ.
Re: atomic accesses
On Mon, Mar 03, 2008 at 11:08:24PM -0500, Robert Dewar wrote: > Segher Boessenkool wrote: > >>>The Linux kernel, and probably some user-space applications and > >>>libraries > >>>as well, depend on GCC guaranteeing (a variant of) the following: > >>> "any access to a naturally aligned scalar object in memory > >>> that is not a bit-field will be performed by a single machine > >>> instruction whenever possible" > >>>and it seems the current compiler actually does work like this. > >>Seems a pity to have the bit-field exception here, why is it there? > > > >Bit-fields will generally require a read-modify-write instruction, > >and I don't think we actually guarantee to generate one right now. > > Well if they do require more than one instruction, the rule has > no effect ("whenever possible"). If they can be done in one > instruction (as on the x86), then why not require this, why > make a special case? Because for the consumers whether the operation is done using a single machine instruction is uninteresting. What matters is if that instruction is atomic. x86 read-modify-write instructions aren't atomic, unless lock prefix is used (and we definitely don't want to use lock prefix on all bitfield accesses) - it actually means there are separate read, modify and write uops. Jakub
Constrain valid arguments to BIT_FIELD_REF
BIT_FIELD_REF is currently only generated by the middle-end (fold, SRA and parts of the vectorizer). At the moment the bit position and size of the extract can be non-constant and the type of the result is unspecified. I suggest to make sure that bit position and size are constants, the object referenced is of integral type (BIT_FIELD_REF should not be used as a way to circumvent aliasing) and the result type is of the same type as the operand zero type (and not a bitfield type of the referenced size -- in which case the BIT_FIELD_REF_UNSIGNED would be useless). The result would then be properly extended according to BIT_FIELD_REF_UNSIGNED. Is this how it was intended? fold currently optimizes a.b.c == 0 to BIT_FIELD_REF & 1 for bit field field-decls c. IMHO this is bad because it pessimizes TBAA (needs to use a's alias set, not the underlying integral type alias set) and it "breaks" type correctness as arbitrary structure types appear as operand zero. ? Thanks, Richard.
Re: atomic accesses
> Well if they do require more than one instruction, the rule has > no effect ("whenever possible"). If they can be done in one > instruction (as on the x86), then why not require this, why > make a special case? We don't even guarantee consistent behavior for volatile bitfields, so I really doubt we can guarantee it for non-volatile bitfields. In particular "int32_t foo:8;" may use either an 8-bit or a 32-bit access, depending what the compiler feels like. Paul
Re: atomic accesses
I'm really wondering why this is being considered. A documented property of the form "GCC will use a single instruction to do X when possible" means exactly nothing. In particular, to call such a statement a "guarantee" is seriously misleading. If Linux needs the single-instruction property for atomicity, and it thinks it can rely on this supposed property, then Linux has a bug. To do atomic operations, you have to use primitives that are guaranteed always to have the necessary atomicity properties. Typically those would be found in asm statements. I suspect it would be valuable to have standardized primitives for atomic actions (semaphores, spinlocks, test-and-set primitives, circular buffers, pick one). But GCC's load/store semantics are not those primitives, with or without a documented "single instruction when possible" property. paul
Re: Constrain valid arguments to BIT_FIELD_REF
On 3/4/08 10:55 AM, Richard Guenther wrote: I suggest to make sure that bit position and size are constants, the object referenced is of integral type (BIT_FIELD_REF should not be used as a way to circumvent aliasing) and the result type is of the same type as the operand zero type (and not a bitfield type of the referenced size -- in which case the BIT_FIELD_REF_UNSIGNED would be useless). The result would then be properly extended according to BIT_FIELD_REF_UNSIGNED. Is this how it was intended? If it wasn't, I think the semantics you propose are fine. If this is only generated by the ME, it should be easy to change. fold currently optimizes a.b.c == 0 to BIT_FIELD_REF & 1 for bit field field-decls c. IMHO this is bad because it pessimizes TBAA (needs to use a's alias set, not the underlying integral type alias set) and it "breaks" type correctness as arbitrary structure types appear as operand zero. Agreed. Unless this was done to fix some target-specific problem, I think it should disappear. Diego.
Re: atomic accesses
Paul Koning wrote: I'm really wondering why this is being considered. A documented property of the form "GCC will use a single instruction to do X when possible" means exactly nothing. In particular, to call such a statement a "guarantee" is seriously misleading. I agree. If Linux needs the single-instruction property for atomicity, and it thinks it can rely on this supposed property, then Linux has a bug. To do atomic operations, you have to use primitives that are guaranteed always to have the necessary atomicity properties. Yes. Typically those would be found in asm statements. I suspect it would be valuable to have standardized primitives for atomic actions (semaphores, spinlocks, test-and-set primitives, circular buffers, pick one). We already have these in gcc, and they're even documented. Andrew.
Re: Constrain valid arguments to BIT_FIELD_REF
On Tue, Mar 04, 2008 at 11:15:00AM -0500, Diego Novillo wrote: > >fold currently optimizes a.b.c == 0 to BIT_FIELD_REF & 1 > >for bit field field-decls c. IMHO this is bad because it pessimizes > >TBAA (needs to use a's alias set, not the underlying integral type > >alias set) and it "breaks" type correctness as arbitrary structure > >types appear as operand zero. > > Agreed. Unless this was done to fix some target-specific problem, I > think it should disappear. Perhaps not in early GIMPLE passes, but we certainly want to lower bitfield accesses to BIT_FIELD_REFs or something similar before expansion, otherwise expander and RTL optimization passes aren't able to optimize but the most trivial cases. GCC generates for bitfields terrible code ATM, try say: struct S { unsigned int a : 3; unsigned int b : 3; unsigned int c : 3; unsigned int d : 3; unsigned int e : 3; unsigned int f : 3; unsigned int g : 3; unsigned int h : 11; } a, b, c; void foo (void) { a.a = b.a | c.a; a.b = b.b | c.b; a.c = b.c | c.c; a.d = b.d | c.d; a.e = b.e | c.e; a.f = b.f | c.f; a.g = b.g | c.g; a.h = b.h | c.h; } which could be optimized into BIT_FIELD_REF = BIT_FIELD_REF | BIT_FIELD_REF ; so something like 3 or 4 instructions, yet we generate 51. Operating on adjacent bitfield fields is fairly common. Similarly (and perhaps far more common in the wild) is e.g. void bar (void) { a.a = 1; a.b = 2; a.c = 3; a.d = 4; a.e = 5; a.f = 6; a.g = 7; a.h = 8; } - on x86_64 24 instructions on the trunk, 1 is enough. RTL is too late to try to optimize this, I've tried that once. Given combiner's limitation of only trying to combine 3 instructions at once, we'd need more. So this is something that needs to be optimized at the tree level, either by having a separate pass that takes care of it, or by lowering it early enough into something that the optimizers will handle. Jakub
Re: Constrain valid arguments to BIT_FIELD_REF
On Tue, 4 Mar 2008, Jakub Jelinek wrote: > On Tue, Mar 04, 2008 at 11:15:00AM -0500, Diego Novillo wrote: > > >fold currently optimizes a.b.c == 0 to BIT_FIELD_REF & 1 > > >for bit field field-decls c. IMHO this is bad because it pessimizes > > >TBAA (needs to use a's alias set, not the underlying integral type > > >alias set) and it "breaks" type correctness as arbitrary structure > > >types appear as operand zero. > > > > Agreed. Unless this was done to fix some target-specific problem, I > > think it should disappear. > > Perhaps not in early GIMPLE passes, but we certainly want to lower > bitfield accesses to BIT_FIELD_REFs or something similar before expansion, > otherwise expander and RTL optimization passes aren't able to optimize but > the most trivial cases. GCC generates for bitfields terrible code ATM, > try say: > struct S > { > unsigned int a : 3; > unsigned int b : 3; > unsigned int c : 3; > unsigned int d : 3; > unsigned int e : 3; > unsigned int f : 3; > unsigned int g : 3; > unsigned int h : 11; > } a, b, c; > > void foo (void) > { > a.a = b.a | c.a; > a.b = b.b | c.b; > a.c = b.c | c.c; > a.d = b.d | c.d; > a.e = b.e | c.e; > a.f = b.f | c.f; > a.g = b.g | c.g; > a.h = b.h | c.h; > } > which could be optimized into BIT_FIELD_REF = BIT_FIELD_REF 32, 0> | BIT_FIELD_REF ; > so something like 3 or 4 instructions, yet we generate 51. > Operating on adjacent bitfield fields is fairly common. > Similarly (and perhaps far more common in the wild) is e.g. > void bar (void) > { > a.a = 1; > a.b = 2; > a.c = 3; > a.d = 4; > a.e = 5; > a.f = 6; > a.g = 7; > a.h = 8; > } > - on x86_64 24 instructions on the trunk, 1 is enough. > RTL is too late to try to optimize this, I've tried that once. > Given combiner's limitation of only trying to combine 3 instructions > at once, we'd need more. So this is something that needs to > be optimized at the tree level, either by having a separate pass > that takes care of it, or by lowering it early enough into something > that the optimizers will handle. Sure. With 4.3 SRA tries to do this. With the MEM_REF lowering I have we optimize the above to foo () { unsigned int MEML.2; unsigned int MEML.1; unsigned int MEML.0; : MEML.0 = MEM ; MEML.1 = MEM ; MEML.2 = MEM ; (load all three words once) MEM = BIT_FIELD_EXPR ) ((unsigned char) BIT_FIELD_REF | (unsigned char) BIT_FIELD_REF ), 3, 0>, () ((unsigned char) BIT_FIELD_REF | (unsigned char) BIT_FIELD_REF ), 3, 3>, () ((unsigned char) BIT_FIELD_REF | (unsigned char) BIT_FIELD_REF ), 3, 6>, () ((unsigned char) BIT_FIELD_REF | (unsigned char) BIT_FIELD_REF ), 3, 9>, () ((unsigned char) BIT_FIELD_REF | (unsigned char) BIT_FIELD_REF ), 3, 12>, () ((unsigned char) BIT_FIELD_REF | (unsigned char) BIT_FIELD_REF ), 3, 15>, () ((unsigned char) BIT_FIELD_REF | (unsigned char) BIT_FIELD_REF ), 3, 18>, () ((short unsigned int) BIT_FIELD_REF | (short unsigned int) BIT_FIELD_REF ), 11, 21>; return; } TER makes a mess out of the expression and obviously we miss some expression combining here (I only have trivial constant folding implemented for BIT_FIELD_EXPR right now). Richard.
Re: atomic accesses
On Tue, Mar 04, 2008 at 04:37:29PM +, Andrew Haley wrote: > >Typically those would be found in asm statements. > > >I suspect it would be valuable to have standardized primitives for > >atomic actions (semaphores, spinlocks, test-and-set primitives, > >circular buffers, pick one). > > We already have these in gcc, and they're even documented. We don't have atomic read or atomic write builtins (ok, you could abuse __sync_fetch_and_add (&x, 0) for atomic read and a loop with __sync_compare_and_swap_val for atomic store, but that's a horrible overkill. Being able to assume that for non-bitfield accesses bigger than certain minimum size, smaller or equal to the word size and naturally aligned the compiler will read or write a value in one lump is certainly desirable and many programs assume it heavily (starting with glibc, kernel, libgomp, ...). The "certain minimum size" is typically either size of char, or (e.g. on old alphas) size of int. Typically the programs care about atomicity of accesses to int, long and pointer sized vars, e.g. have only threads in a critical section modify a variable, but be able to read that variable outside of critical section and see only values that were written in the critical section, not say half of an old value and half of a new value. Jakub
Re: Constrain valid arguments to BIT_FIELD_REF
On 3/4/08, Richard Guenther <[EMAIL PROTECTED]> wrote: > I suggest to make sure that bit position and size are constants, the > object referenced is of integral type (BIT_FIELD_REF should not be > used as a way to circumvent aliasing) and the result type is of the > same type as the operand zero type (and not a bitfield type of the > referenced size -- in which case the BIT_FIELD_REF_UNSIGNED would > be useless). The result would then be properly extended according > to BIT_FIELD_REF_UNSIGNED. I tried non constant bit position with BIT_FIELD_REF of vector types and it crashed in expand so I think this is the correct thing to do. Though it would be nice if we have a VEC_EXTRACT tree instead of overloading BIT_FIELD_REF for it that takes a non constant position so we can do better optimization there in some cases (yes people write code that extracts parts of vectors, trust me). -- Pinski
Re: atomic accesses
Jakub Jelinek wrote: On Tue, Mar 04, 2008 at 04:37:29PM +, Andrew Haley wrote: Typically those would be found in asm statements. I suspect it would be valuable to have standardized primitives for atomic actions (semaphores, spinlocks, test-and-set primitives, circular buffers, pick one). We already have these in gcc, and they're even documented. We don't have atomic read or atomic write builtins (ok, you could abuse __sync_fetch_and_add (&x, 0) for atomic read and a loop with __sync_compare_and_swap_val for atomic store, but that's a horrible overkill. Being able to assume that for non-bitfield accesses bigger than certain minimum size, smaller or equal to the word size and naturally aligned the compiler will read or write a value in one lump is certainly desirable and many programs assume it heavily (starting with glibc, kernel, libgomp, ...). That seems reasonable, but I suspect that coming up with wordage to describe it sufficiently formally for all cases will be tricky. AFAIK the only reason we don't break this rule is that doing so would be grossly inefficient; there's nothing to stop any gcc back-end with (say) seriously slow DImode writes from using two SImode writes instead. The "certain minimum size" is typically either size of char, or (e.g. on old alphas) size of int. Typically the programs care about atomicity of accesses to int, long and pointer sized vars, e.g. have only threads in a critical section modify a variable, but be able to read that variable outside of critical section and see only values that were written in the critical section, not say half of an old value and half of a new value. Andrew.
Re: atomic accesses
> "Andrew" == Andrew Haley <[EMAIL PROTECTED]> writes: >> We don't have atomic read or atomic write builtins (ok, you could >> abuse __sync_fetch_and_add (&x, 0) for atomic read and a loop with >> __sync_compare_and_swap_val for atomic store, but that's a >> horrible overkill. Being able to assume that for non-bitfield >> accesses bigger than certain minimum size, smaller or equal to the >> word size and naturally aligned the compiler will read or write a >> value in one lump is certainly desirable and many programs assume >> it heavily (starting with glibc, kernel, libgomp, ...). Andrew> That seems reasonable, but I suspect that coming up with Andrew> wordage to describe it sufficiently formally for all cases Andrew> will be tricky. Probably so. In any case, clearly no such wordage could include the phrase "when possible". Andrew> AFAIK the only reason we don't break this rule is that doing Andrew> so would be grossly inefficient; there's nothing to stop any Andrew> gcc back-end with (say) seriously slow DImode writes from Andrew> using two SImode writes instead. Or, say, because you're using the MIPS O32 ABI rather than the N32/N64 ABI... which is yet another example why a formal rule is tricky. >> The "certain minimum size" is typically either size of char, or >> (e.g. on old alphas) size of int. Not just old Alphas. Even if the instruction set has a "store character" opcode, the hardware is going to do a read-modify-write internally; is that an atomic operation? Not necessarily. Perhaps even "only rarely". paul
Re: Constrain valid arguments to BIT_FIELD_REF
On Tue, 4 Mar 2008, Andrew Pinski wrote: > On 3/4/08, Richard Guenther <[EMAIL PROTECTED]> wrote: > > I suggest to make sure that bit position and size are constants, the > > object referenced is of integral type (BIT_FIELD_REF should not be > > used as a way to circumvent aliasing) and the result type is of the > > same type as the operand zero type (and not a bitfield type of the > > referenced size -- in which case the BIT_FIELD_REF_UNSIGNED would > > be useless). The result would then be properly extended according > > to BIT_FIELD_REF_UNSIGNED. > > I tried non constant bit position with BIT_FIELD_REF of vector types > and it crashed in expand so I think this is the correct thing to do. > Though it would be nice if we have a VEC_EXTRACT tree instead of > overloading BIT_FIELD_REF for it that takes a non constant position so > we can do better optimization there in some cases (yes people write > code that extracts parts of vectors, trust me). FWIW I agree. After all we also have REALPART_EXPR and IMAGPART_EXPR, a VEC_EXTRACT sounds fine (after all we already have a lot of VEC_ codes). At least BIT_FIELD_REF should not be VIEW_CONVERT_EXPR on steroids. Richard.
Re: Benchmarks: 7z, bzip2 & gzip.
On Tue, Mar 04, 2008 at 09:02:34AM +, Martin Guy wrote: > Is there a clause in regressions for "takes longer to compile and > produces worse code"? Worse code is a regression, so is slower compile time. Both are judgement calls; some of them are not going to be changed, but safe patches changing them are allowed on regression-only branches. -- Daniel Jacobowitz CodeSourcery
[4.3/4.4]: PATCH: PR target/35453: nmmintrin.h defines macros SIDD_XXX
Hi, Here is the patch for both gcc 4.3 and 4.4. OK for 4.3/4.4? Tested on Linux/ia32 and Linux/ia64 with gcc 4.3/4.4. Thanks. H.J. On Tue, Mar 4, 2008 at 1:19 AM, Richard Guenther <[EMAIL PROTECTED]> wrote: > On Mon, 3 Mar 2008, H.J. Lu wrote: > > > Hi, > > > > I'd like to fix > > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35453 > > > > for gcc 4.3. Defines SIDD_XXX in SSE4 header file is a bad idea. SSE 4 > > header file > > in icc will also be fixed. > > Works for me. > > Richard. > gcc/ 2008-03-03 H.J. Lu <[EMAIL PROTECTED]> PR target/35453 * config/i386/smmintrin.h (SIDD_XXX): Renamed to ... (_SIDD_XXX): This. gcc/testsuite/ 2008-03-03 H.J. Lu <[EMAIL PROTECTED]> PR target/35453 * gcc.target/i386/sse4_2-pcmpestri-1.c: Replace SIDD_XXX with _SIDD_XXX. * gcc.target/i386/sse4_2-pcmpestri-2.c: Likewise. * gcc.target/i386/sse4_2-pcmpestrm-1.c: Likewise. * gcc.target/i386/sse4_2-pcmpestrm-2.c: Likewise. * gcc.target/i386/sse4_2-pcmpistri-1.c: Likewise. * gcc.target/i386/sse4_2-pcmpistri-2.c: Likewise. * gcc.target/i386/sse4_2-pcmpistrm-1.c: Likewise. * gcc.target/i386/sse4_2-pcmpistrm-2.c: Likewise. * gcc.target/i386/sse4_2-pcmpstr.h: Likewise. --- gcc/config/i386/smmintrin.h.sidd 2007-12-15 15:49:23.0 -0800 +++ gcc/config/i386/smmintrin.h 2008-03-03 20:22:22.0 -0800 @@ -470,30 +470,30 @@ _mm_stream_load_si128 (__m128i *__X) #ifdef __SSE4_2__ /* These macros specify the source data format. */ -#define SIDD_UBYTE_OPS 0x00 -#define SIDD_UWORD_OPS 0x01 -#define SIDD_SBYTE_OPS 0x02 -#define SIDD_SWORD_OPS 0x03 +#define _SIDD_UBYTE_OPS 0x00 +#define _SIDD_UWORD_OPS 0x01 +#define _SIDD_SBYTE_OPS 0x02 +#define _SIDD_SWORD_OPS 0x03 /* These macros specify the comparison operation. */ -#define SIDD_CMP_EQUAL_ANY 0x00 -#define SIDD_CMP_RANGES 0x04 -#define SIDD_CMP_EQUAL_EACH 0x08 -#define SIDD_CMP_EQUAL_ORDERED 0x0c +#define _SIDD_CMP_EQUAL_ANY 0x00 +#define _SIDD_CMP_RANGES 0x04 +#define _SIDD_CMP_EQUAL_EACH 0x08 +#define _SIDD_CMP_EQUAL_ORDERED 0x0c /* These macros specify the the polarity. */ -#define SIDD_POSITIVE_POLARITY 0x00 -#define SIDD_NEGATIVE_POLARITY 0x10 -#define SIDD_MASKED_POSITIVE_POLARITY 0x20 -#define SIDD_MASKED_NEGATIVE_POLARITY 0x30 +#define _SIDD_POSITIVE_POLARITY 0x00 +#define _SIDD_NEGATIVE_POLARITY 0x10 +#define _SIDD_MASKED_POSITIVE_POLARITY 0x20 +#define _SIDD_MASKED_NEGATIVE_POLARITY 0x30 /* These macros specify the output selection in _mm_cmpXstri (). */ -#define SIDD_LEAST_SIGNIFICANT 0x00 -#define SIDD_MOST_SIGNIFICANT 0x40 +#define _SIDD_LEAST_SIGNIFICANT 0x00 +#define _SIDD_MOST_SIGNIFICANT 0x40 /* These macros specify the output selection in _mm_cmpXstrm (). */ -#define SIDD_BIT_MASK 0x00 -#define SIDD_UNIT_MASK 0x40 +#define _SIDD_BIT_MASK 0x00 +#define _SIDD_UNIT_MASK 0x40 /* Intrinsics for text/string processing. */ --- gcc/testsuite/gcc.target/i386/sse4_2-pcmpestri-1.c.sidd 2007-08-23 09:44:31.0 -0700 +++ gcc/testsuite/gcc.target/i386/sse4_2-pcmpestri-1.c 2008-03-03 20:23:20.0 -0800 @@ -8,15 +8,15 @@ #define NUM 1024 #define IMM_VAL0 \ - (SIDD_SBYTE_OPS | SIDD_CMP_RANGES | SIDD_MASKED_POSITIVE_POLARITY) + (_SIDD_SBYTE_OPS | _SIDD_CMP_RANGES | _SIDD_MASKED_POSITIVE_POLARITY) #define IMM_VAL1 \ - (SIDD_UBYTE_OPS | SIDD_CMP_EQUAL_EACH | SIDD_NEGATIVE_POLARITY \ - | SIDD_MOST_SIGNIFICANT) + (_SIDD_UBYTE_OPS | _SIDD_CMP_EQUAL_EACH | _SIDD_NEGATIVE_POLARITY \ + | _SIDD_MOST_SIGNIFICANT) #define IMM_VAL2 \ - (SIDD_UWORD_OPS | SIDD_CMP_EQUAL_ANY | SIDD_MASKED_NEGATIVE_POLARITY) + (_SIDD_UWORD_OPS | _SIDD_CMP_EQUAL_ANY | _SIDD_MASKED_NEGATIVE_POLARITY) #define IMM_VAL3 \ - (SIDD_SWORD_OPS | SIDD_CMP_EQUAL_ORDERED \ - | SIDD_MASKED_NEGATIVE_POLARITY | SIDD_LEAST_SIGNIFICANT) + (_SIDD_SWORD_OPS | _SIDD_CMP_EQUAL_ORDERED \ + | _SIDD_MASKED_NEGATIVE_POLARITY | _SIDD_LEAST_SIGNIFICANT) static void --- gcc/testsuite/gcc.target/i386/sse4_2-pcmpestri-2.c.sidd 2007-08-23 09:44:31.0 -0700 +++ gcc/testsuite/gcc.target/i386/sse4_2-pcmpestri-2.c 2008-03-03 20:23:25.0 -0800 @@ -8,15 +8,15 @@ #define NUM 1024 #define IMM_VAL0 \ - (SIDD_SBYTE_OPS | SIDD_CMP_RANGES | SIDD_MASKED_POSITIVE_POLARITY) + (_SIDD_SBYTE_OPS | _SIDD_CMP_RANGES | _SIDD_MASKED_POSITIVE_POLARITY) #define IMM_VAL1 \ - (SIDD_UBYTE_OPS | SIDD_CMP_EQUAL_EACH | SIDD_NEGATIVE_POLARITY \ - | SIDD_MOST_SIGNIFICANT) + (_SIDD_UBYTE_OPS | _SIDD_CMP_EQUAL_EACH | _SIDD_NEGATIVE_POLARITY \ + | _SIDD_MOST_SIGNIFICANT) #define IMM_VAL2 \ - (SIDD_UWORD_OPS | SIDD_CMP_EQUAL_ANY | SIDD_MASKED_NEGATIVE_POLARITY) + (_SIDD_UWORD_OPS | _SIDD_CMP_EQUAL_ANY | _SIDD_MASKED_NEGATIVE_POLARITY) #define IMM_VAL3 \ - (SIDD_SWORD_OPS | SIDD_CMP_EQUAL_ORDERED \ - | SIDD_MASKED_NEGATIVE_POLARITY | SIDD_LEAST_SIGNIFICANT) + (_SIDD_SWORD_OPS | _SIDD_CMP_EQUAL_ORDERED \ + | _SIDD_MASKED_NEGATIVE_POLARITY | _SIDD_LEAST_SIGNIFICANT) static void --- gcc/testsuite/gcc.tar
Re: atomic accesses
Jakub Jelinek wrote: On Tue, Mar 04, 2008 at 04:37:29PM +, Andrew Haley wrote: Typically those would be found in asm statements. I suspect it would be valuable to have standardized primitives for atomic actions (semaphores, spinlocks, test-and-set primitives, circular buffers, pick one). We already have these in gcc, and they're even documented. We don't have atomic read or atomic write builtins (ok, you could abuse __sync_fetch_and_add (&x, 0) for atomic read and a loop with __sync_compare_and_swap_val for atomic store, but that's a horrible overkill. There is nothing preventing us from adding __sync_fetch and __sync_store so that we could avoid the overkill. Perhaps anything declared volatile should have these semantics. Although mentioning 'volatile' on the lkml is probably not a good idea. David Daney
Re: atomic accesses
On Tue, Mar 4, 2008 at 7:31 PM, David Daney <[EMAIL PROTECTED]> wrote: > Jakub Jelinek wrote: > > On Tue, Mar 04, 2008 at 04:37:29PM +, Andrew Haley wrote: > >>> Typically those would be found in asm statements. > >>> I suspect it would be valuable to have standardized primitives for > >>> atomic actions (semaphores, spinlocks, test-and-set primitives, > >>> circular buffers, pick one). > >> We already have these in gcc, and they're even documented. > > > > We don't have atomic read or atomic write builtins (ok, you could > > abuse __sync_fetch_and_add (&x, 0) for atomic read and a loop > > with __sync_compare_and_swap_val for atomic store, but that's a horrible > > overkill. > > There is nothing preventing us from adding __sync_fetch and __sync_store > so that we could avoid the overkill. > > Perhaps anything declared volatile should have these semantics. > Although mentioning 'volatile' on the lkml is probably not a good idea. Certainly not. volatile has nothing to do with atomic access. Richard.
Re: atomic accesses
Richard Guenther wrote: On Tue, Mar 4, 2008 at 7:31 PM, David Daney <[EMAIL PROTECTED]> wrote: Perhaps anything declared volatile should have these semantics. Although mentioning 'volatile' on the lkml is probably not a good idea. Certainly not. volatile has nothing to do with atomic access. It was more of an idle thought than a proposal. David Daney
Re: birthpoints in rtl.
On Sat, Mar 1, 2008 at 2:46 PM, Paolo Bonzini <[EMAIL PROTECTED]> wrote: > > > By the way, I still don't understand how birth points would work. Can > > someone give an example of what the insn stream would look like with > > birth points, and what the DU/UD chains would look like? > > With a big IIUC, and using a high-level IR for simplicity > > if (a < 5) goto BB1; else goto BB2; > BB1: b = 3; goto BB3; > BB2: b = c; goto BB3; > BB3: return b * b; > > DF info for b: > insn has def D1 > insn has def D2 > insn has use U1 (use-def chain [D1,D2]) and >use U2 (use-def chain [D1,D2]) > > becomes > > if (a < 5) goto BB1; else goto BB2; > BB1: b = 3; goto BB2; > BB2: b = c; goto BB3; > BB3: b = b; return b * b; > > DF info for b: > insnhas def D1 > insnhas def D2 > birthpoint has use U1 (use-def chain [D1,D2]) > and def D3 > insnhas use U2 (use-def chain [D3]) > and use U3 (use-def chain [D3]) > > Basically the only non-singleton UD chains are for a birthpoint's RHS, > and the UD chains of birthpoints correspond to PHI operands. The > singleton UD chains correspond to subscripted SSA variables. I think > this is isomorphic to FUD chains. Thanks for this explanation! I had the feeling that FUD chains and birth points are just different names for basically the same thing. That also seems to be your interpretation. So, from an implementation, would we make PHI-like UD-chains to nop insns that represent the birth points, or would we actually add PHI functions and let the "normal" UD-chains point to the PHI function arguments? What about keeping things up-to-date after applying some transformations? It is already hard to keep UD/DU chains up-to-date now (I don't think any pass successfully does so right now). This should be a lot easier if you fully factorize your UD chains, right? Gr. Steven
Re: birthpoints in rtl.
On 3/4/08 1:53 PM, Steven Bosscher wrote: So, from an implementation, would we make PHI-like UD-chains to nop insns that represent the birth points, or would we actually add PHI functions and let the "normal" UD-chains point to the PHI function arguments? Why put them in the IL stream at all? All you need is to have these factoring devices in the DF web. It really doesn't need to be part of the IL stream. Both PHIs and birthpoints are merely factoring devices that let you cut down the number of UD links. They don't need to be part of the IL, much like none of the DF objects are part of the RTL IL. What about keeping things up-to-date after applying some transformations? It is already hard to keep UD/DU chains up-to-date now (I don't think any pass successfully does so right now). This should be a lot easier if you fully factorize your UD chains, right? In theory, yes. Code for keeping these things up-to-date already exists in GIMPLE SSA. Diego.
Re: birthpoints in rtl.
On Sat, Mar 1, 2008 at 11:13 AM, Jan Hubicka <[EMAIL PROTECTED]> wrote: > > Diego, > > > > I am leaning to just adding noop moves at the birthpoints (dominance > > frontiers) as real noop move insns in the streams in the passes that use > > ud or du chains. The back end is tolerant of noop moves and without > > Hi, > while I am with Diego that would preffer PHI nodes on side especially in > FUD chain where rest of your SSA is on side too. But if we go with the > extra instruction scheme, I think you are much better to use RTL USE > instruction. The moves are generated by target machinery and can do > funny things, like clobbering flags or whatever. USEs are transparent > this way. I think that without the extra instruction it will be difficult to keep the FUD chains up to date. One of the nice features of the extra instruction (be it a PHI, trivial move, or a USE) is that it explicitly gives you a new location for a DEF. That makes updating things a lot easier, I suspect. Consider e.g. replacing an operand in an insn. Because you know only the DEF of operand in the the extra instruction will reach, you can update UD chains right away. This is harder if there are more reaching defs. The most trivial example I can think of is e.g. 1 if (c) 2x = ...; D1(x) 3 else 4x = ...; D2(x) 5 6 y = x; D4(y), U2(x), UD(U2,D1), UD(U2,D2) 7 z = y; D5(z), U3(y), UD(U3,D4) If you copy propagate x to z, you have to add an extra UD chain. With the extra instruction, you don't: 1 if (c) 2x = ...; D1(x) 3 else 4x = ...; D2(x) 5 6 x = x; D3(x), U1(x), UD(U1,D1|D2) 6 y = x; D4(y), U2(x), UD(U2,D3) 7 z = y; D5(z), U3(y), UD(U3,D4) For the location of the extra instructions, I would *not* keep them on the side. If you have something special going on, my motto is: "Make it explicit". Gr. Steven
Re: birthpoints in rtl.
On Tue, Mar 4, 2008 at 7:58 PM, Diego Novillo <[EMAIL PROTECTED]> wrote: > Both PHIs and birthpoints are merely factoring devices that let you cut > down the number of UD links. They don't need to be part of the IL, much > like none of the DF objects are part of the RTL IL. Maybe they don't need to be, but it may be useful to have them anyway. > > What about keeping things up-to-date after applying some > > transformations? It is already hard to keep UD/DU chains up-to-date > > now (I don't think any pass successfully does so right now). This > > should be a lot easier if you fully factorize your UD chains, right? > > In theory, yes. Code for keeping these things up-to-date already exists > in GIMPLE SSA. That code is IMHO just awfully ugly. And slow too, last I checked. I don't know if this is still true, but update_ssa used to walk a huge part of the dominator tree even for seemingly trivial updates. We should not want that on RTL. I don't think we should allow transformations on RTL that are too hard to manually update the FUD chains somehow. Gr. Steven
Re: birthpoints in rtl.
On 3/4/08 2:12 PM, Steven Bosscher wrote: That code is IMHO just awfully ugly. And slow too, last I checked. Yeah, there's quite a bit of bookkeeping needed to do incremental SSA updates. should not want that on RTL. I don't think we should allow transformations on RTL that are too hard to manually update the FUD chains somehow. For FUD chains, the incremental work is actually easier to code. You need to re-write the FUD chains for the affected objects from scratch. Forcing the optimizers to maintain this themselves is certainly easier for the framework code, but it imposes a heavier burden on the optimizers. Diego.
Re: Benchmarks: 7z, bzip2 & gzip.
J.C. Pizarro wrote: > p7zip-4.57 > [...] > 1. 1m50s compile, 1630164 file, 1618639 text, 6120 data, 27168 bss, 5m50s run. > 2. 1m53s compile, 1665952 file, 1649829 text, 4668 data, 29160 bss, 6m04s run. > 3. 2m08s compile, 1629088 file, 1613313 text, 4672 data, 29160 bss, 5m54s run. > 4. 2m36s compile, 2063216 file, 2047420 text, 4380 data, 29160 bss, 6m14s run. > 5. 2m30s compile, 1976228 file, 1960164 text, 4380 data, 29160 bss, 6m12s run. Has anybody analyzed this code size and performance regression? For simplicity, we could limit our comparison to 3 (gcc-4.2.4) against 4 (gcc-4.3.0) or 5 (gcc-4.4.0). For the performance regression, one could proceed like this: - profile the 7z run to find the hot-spot(s) - disassemble the output of both compilers - look for obvious pessimizations in the hot parts We're looking for a 6-7% change in time, so it may very well be a single instruction. -- \___/ |___| Bernardo Innocenti - http://www.codewiz.org/ \___\ One Laptop Per Child - http://www.laptop.org/
Re: birthpoints in rtl.
Steven Bosscher wrote: > On Tue, Mar 4, 2008 at 7:58 PM, Diego Novillo <[EMAIL PROTECTED]> wrote: > >> Both PHIs and birthpoints are merely factoring devices that let you cut >> down the number of UD links. They don't need to be part of the IL, much >> like none of the DF objects are part of the RTL IL. >> > > Maybe they don't need to be, but it may be useful to have them anyway. > > >> > What about keeping things up-to-date after applying some >> > transformations? It is already hard to keep UD/DU chains up-to-date >> > now (I don't think any pass successfully does so right now). This >> > should be a lot easier if you fully factorize your UD chains, right? >> >> In theory, yes. Code for keeping these things up-to-date already exists >> in GIMPLE SSA. >> > > That code is IMHO just awfully ugly. And slow too, last I checked. I > don't know if this is still true, but update_ssa used to walk a huge > part of the dominator tree even for seemingly trivial updates. We > should not want that on RTL. I don't think we should allow > transformations on RTL that are too hard to manually update the FUD > chains somehow. > > Gr. > Steven > There are many differences between fud/birthpoints and real ssa: 1) In ssa, the operands of the phis and the renaming contain information. The operands are paired with the cfg edges that the values come in on. In fud/birthpoints there is no such pairing or renaming. For some problems, like conditional constant, this pairing and renaming is what makes the algorithm work. You actually do not get the same answer (you get an inferior but still correct answer) if you do not use the pairing and renaming. 2) There are two "kinds" of ssa algorithms that are used in gcc. There are dirty ssa algorithms and "clean" algorithms. Dirty algorithms do not keep ssa up to date as you make the transformations. Clean ones do. I have never understood why it was necessary for gcc to use so many dirty ssa algorithms. When NaturalBridge built our compiler, we were able to use almost exclusively clean algorithms. Tricks like loop closed ssa form, help, but in general this is a matter of care on the algorithmic design side and implementation. We never had something like rebuild ssa at naturalbridge. I do not know how much this changes with memory ssa. I assume clean algorithms are harder here, but I have no experience with it. I have had particularly heated discussions with zdenek over the years where he asserted that it was not worth it/impossible to think about clean ssa algorithms and i showed him simple tricks to keep things up to date. I believe he just simply ignored me. fud/birthpoints are generally harder to develop clean algorithms for. I have never seen any published. The information in the phis really helps you develop clean algorithms. I would love to see the rtl back end use phis and renaming rather than fuds/birthpoints. The thing is that for phi functions to really be profitable, you need to have a large number passes in a row that are ssa clean. So my plan was to basically start small and just use fud/birthpoints to control the space/time of the existing suite of passes. Kenny
Re: birthpoints in rtl.
"Steven Bosscher" <[EMAIL PROTECTED]> writes: > For the location of the extra instructions, I would *not* keep them on > the side. If you have something special going on, my motto is: "Make > it explicit". Going back to something discussed upthread: would you expect to use this for hard regs as well as pseudos? No-op moves aren't necessarily supported for all hard registers. E.g. MIPS doesn't have patterns for LO <- LO, even though LO is a normal non-fixed register. You can also have hard registers that only appear in fixed patterns, such as condition code REGs. If we went for an explicit move, I assume we would either have to (a) discount hard regs that can't be moved, (b) force backends to allow all no-op moves or (c) circumvent the backend somehow. (Jan suggested a USE for (c), but I assume we'd want some sort of definition too.) Kenny said that pseudos-only was better than nothing, and I can't disagree with that. But one of the nice things about the on-the-side idea is that you have none of these problems. There should be nothing special about hard regs. I take your point about not wanting something special going on behind the scenes. But these insns seem pretty special in their own right, especially if we go for (a) or (c). Even if we go for (b), wouldn't optimisers need to know that they shouldn't just delete the moves? Richard
Re: birthpoints in rtl.
Richard Sandiford wrote: > "Steven Bosscher" <[EMAIL PROTECTED]> writes: > >> For the location of the extra instructions, I would *not* keep them on >> the side. If you have something special going on, my motto is: "Make >> it explicit". >> > > Going back to something discussed upthread: would you expect to use this > for hard regs as well as pseudos? No-op moves aren't necessarily supported > for all hard registers. E.g. MIPS doesn't have patterns for LO <- LO, > even though LO is a normal non-fixed register. You can also have hard > registers that only appear in fixed patterns, such as condition code REGs. > > If we went for an explicit move, I assume we would either have to > (a) discount hard regs that can't be moved, (b) force backends to > allow all no-op moves or (c) circumvent the backend somehow. > (Jan suggested a USE for (c), but I assume we'd want some sort > of definition too.) > > Kenny said that pseudos-only was better than nothing, and I can't > disagree with that. But one of the nice things about the on-the-side > idea is that you have none of these problems. There should be nothing > special about hard regs. > > I take your point about not wanting something special going on behind > the scenes. But these insns seem pretty special in their own right, > especially if we go for (a) or (c). Even if we go for (b), wouldn't > optimisers need to know that they shouldn't just delete the moves? > > Richard > This is the first concrete argument that i have seen for keeping them on the side. The rest of the discussions (including my own) have been mostly beauty as opposed to truth. >From my point of view, this is a killer argument that if we want to build fuds/birthpoints for all regs, the info must be on the side. I want to build the info for everything because i want to ditch the reaching defs way of building chains in favor of building them using the existing ssa technology. I do not want to have to call both techniques to build the chains. kenny
Re: birthpoints in rtl.
On 3/4/08 2:38 PM, Kenneth Zadeck wrote: Steven Bosscher wrote: On Tue, Mar 4, 2008 at 7:58 PM, Diego Novillo <[EMAIL PROTECTED]> wrote: Both PHIs and birthpoints are merely factoring devices that let you cut down the number of UD links. They don't need to be part of the IL, much like none of the DF objects are part of the RTL IL. Maybe they don't need to be, but it may be useful to have them anyway. > What about keeping things up-to-date after applying some > transformations? It is already hard to keep UD/DU chains up-to-date > now (I don't think any pass successfully does so right now). This > should be a lot easier if you fully factorize your UD chains, right? In theory, yes. Code for keeping these things up-to-date already exists in GIMPLE SSA. That code is IMHO just awfully ugly. And slow too, last I checked. I don't know if this is still true, but update_ssa used to walk a huge part of the dominator tree even for seemingly trivial updates. We should not want that on RTL. I don't think we should allow transformations on RTL that are too hard to manually update the FUD chains somehow. Gr. Steven There are many differences between fud/birthpoints and real ssa: 1) In ssa, the operands of the phis and the renaming contain information. The operands are paired with the cfg edges that the values come in on. In fud/birthpoints there is no such pairing or renaming. Yes, there is in FUD chains. They keep PHI nodes with arguments paired to their edges. Bithpoints do not keep this pairing. 2) There are two "kinds" of ssa algorithms that are used in gcc. There are dirty ssa algorithms and "clean" algorithms. Dirty algorithms do not keep ssa up to date as you make the transformations. Clean ones do. Popular demand. Passes are encouraged to keep SSA up-to-date themselves, but we also have mechanisms for: (1) an API to register SSA updates in a sub-graph, (2) a way of introducing new symbols and have them put in SSA form. When FUD chains are altered, they can be renamed from scratch by putting their symbols in the to-rename list, or they can be kept manually by each pass. The latter is usually faster. I have never understood why it was necessary for gcc to use so many dirty ssa algorithms. When NaturalBridge built our compiler, we were able to use almost exclusively clean algorithms. Tricks like loop closed ssa form, help, but in general this is a matter of care on the algorithmic design side and implementation. We never had something like rebuild ssa at naturalbridge. I do not know how much this changes with memory ssa. Memory SSA *is* FUD chains. I assume clean algorithms are harder here, but I have no experience with it. Not all that harder, though it's generally easier to just mark the affected symbols for renaming. That makes things slower, though. The updater does not have to put the whole program in SSA form again, but it does have to traverse the whole CFG looking for defs/uses of the affected symbols. I would love to see the rtl back end use phis and renaming rather than fuds/birthpoints. The thing is that for phi functions to really be profitable, you need to have a large number passes in a row that are ssa clean. So my plan was to basically start small and just use fud/birthpoints to control the space/time of the existing suite of passes. Makes sense. Diego.
Re: birthpoints in rtl.
Hi, > > 1) In ssa, the operands of the phis and the renaming contain > information. The operands are paired with the cfg edges that the > values come in on. In fud/birthpoints there is no such pairing or > renaming. For some problems, like conditional constant, this pairing > and renaming is what makes the algorithm work. You actually do not get > the same answer (you get an inferior but still correct answer) if you do > not use the pairing and renaming. I must be quite lost here. In the non-rewriting SSA (or what I think FUD chain is) is in my view essentially just alternative representatin of SSA program. Instead of having SSA_NAMES and PHI nodes in your IL directly, they sit on-side datastructure. They hold same information: version numbers and PHI nodes associated with edges of CFG. For optimization passes they are however 100% equivalent, just you look at different places in memory that should be more or less hidden in abstraction. Surely with this representation all the SSA analysis algorithms will work, since what you see is SSA form. The difference is that you can't simply use particular SSA name at any place in a program without adding code to copy the value to register at a place it is defined to be sure that the original location is not overwritten. This is relatively little extra hassle compared to rewritten SSA form and in the case of conditional constant propagation you don't need to worry even about that. Not too different on discussions wheteher you should have on-side CFG and duplicate the info by goto statements or CFG as part of the IL. Given that RTL deals with architectural details like partial writes or hard registers, it seems to make sense to actually target to FUD (or non-rewritting SSA) rather than trying to adjust RTL to allow SSA in some form on all those constructs explicitly. Or at least it don't seem significandly inferrior to me and a lot easier to accomplish. Honza
Re: birthpoints in rtl.
On Tue, Mar 4, 2008 at 8:47 PM, Richard Sandiford <[EMAIL PROTECTED]> wrote: > "Steven Bosscher" <[EMAIL PROTECTED]> writes: > Going back to something discussed upthread: would you expect to use this > for hard regs as well as pseudos? No-op moves aren't necessarily supported > for all hard registers. E.g. MIPS doesn't have patterns for LO <- LO, > even though LO is a normal non-fixed register. You can also have hard > registers that only appear in fixed patterns, such as condition code REGs. Yes, for hard registers you can't use this. Another example is the loop counter register on ia64, or the flags register on i386. You should have a look at the history of RTL SSA for hard registers (http://gcc.gnu.org/ml/gcc-patches/2000-07/msg01285.html and thread). They used to put selected hard regs into SSA form. Lessons learned: don't do that. I think the same applies to FUD chains for hard registers. > Kenny said that pseudos-only was better than nothing, and I can't > disagree with that. But one of the nice things about the on-the-side > idea is that you have none of these problems. There should be nothing > special about hard regs. Uh, hard registers *are* special. And, also quite important, what would you *do* with FUD chains for hard registers? We don't optimize many things with hard registers, usually just because it's harder to do than for pseudos. I don't think FUD chains would change that. > I take your point about not wanting something special going on behind > the scenes. But these insns seem pretty special in their own right, > especially if we go for (a) or (c). Even if we go for (b), wouldn't > optimisers need to know that they shouldn't just delete the moves? Old RTL SSA did (a). It didn't work. Neither (b) nor (c) seem viable ideas to me... Gr. Steven
Re: birthpoints in rtl.
Jan Hubicka wrote: > Hi, > >> 1) In ssa, the operands of the phis and the renaming contain >> information. The operands are paired with the cfg edges that the >> values come in on. In fud/birthpoints there is no such pairing or >> renaming. For some problems, like conditional constant, this pairing >> and renaming is what makes the algorithm work. You actually do not get >> the same answer (you get an inferior but still correct answer) if you do >> not use the pairing and renaming. >> > > I must be quite lost here. In the non-rewriting SSA (or what I think > FUD chain is) is in my view essentially just alternative representatin > of SSA program. Instead of having SSA_NAMES and PHI nodes in your IL > directly, they sit on-side datastructure. They hold same information: > version numbers and PHI nodes associated with edges of CFG. For > optimization passes they are however 100% equivalent, just you look at > different places in memory that should be more or less hidden in > abstraction. > > Surely with this representation all the SSA analysis algorithms will > work, since what you see is SSA form. The difference is that you can't > simply use particular SSA name at any place in a program without adding > code to copy the value to register at a place it is defined to be sure > that the original location is not overwritten. > > This is relatively little extra hassle compared to rewritten SSA form > and in the case of conditional constant propagation you don't need to > worry even about that. Not too different on discussions wheteher you > should have on-side CFG and duplicate the info by goto statements or CFG > as part of the IL. > > Given that RTL deals with architectural details like partial writes or > hard registers, it seems to make sense to actually target to FUD (or > non-rewritting SSA) rather than trying to adjust RTL to allow SSA in > some form on all those constructs explicitly. Or at least it don't seem > significandly inferrior to me and a lot easier to accomplish. > > Honza > I think that at this point, i have been convinced to: 1) use fud's rather than birthpoints because these do keep a slot for the value along each in edge. 2) keep the info on the side (see rsandifors diverging thread). I am not there on keeping extra names on the side. The advantage of the extra names is that it gives you extra freedom. the disadvantage is that either the transformations are more expensive or getting out of the renamed form is expensive. Again, if we have a suite of contiguous converted passes in a row, i could be swayed in the renaming on the side direction, especially if they butted up against the rtl generation step and avoided the out of ssa for the tree ssa. But that is a long time in the future, and i do not see the short term benefits. kenny
Re: Possible GCC 4.3 driver regression caused by your patch
On Mon, Mar 03, 2008 at 08:11:30AM -0500, Hans-Peter Nilsson wrote: > On Sun, 2 Mar 2008, Greg Schafer wrote: > > Hi Carlos and Mark, > > > > Your "Relocated compiler should not look in $prefix" patch here: > > > > http://gcc.gnu.org/ml/gcc/2006-10/msg00280.html > > > > appears to have caused a regression in my GCC 4.3 testing. > > So *now* I know why my cross-test setup to (non-sysrooted) > cris-axis-linux-gnu have trouble finding startfiles and > pre-installed include files! Thanks! It seems Carlos' fix for > the testsuite, has some flaw I'll look into. At the very least, > cutnpasting commands from the dejagnu .log files don't work; > there's some environment variable (more than just > GCC_EXEC_PREFIX, AFAICT). And some testsuites (forgot, maybe it > was libgomp?) need to be adjusted too. On a related note, this patch has also caused a testsuite regression for me as evidenced by: WARNING: Could not compile g++.dg/compat/struct-layout-1 generator WARNING: Could not compile gcc.dg/compat/struct-layout-1 generator My context is building up a new system inside a chroot whereby I'm configuring GCC with --prefix=/usr but the "host" GCC is in some other prefix. The patch tries to fix the testsuite infrastructure by adding "set GCC_EXEC_PREFIX \"$(libdir)/gcc/\"" to site.exp. But in my scenario, this results in: gcc: error trying to exec 'cc1': execvp: No such file or directory gcc: error trying to exec 'cc1': execvp: No such file or directory gcc: error trying to exec 'cc1': execvp: No such file or directory when trying to build the generator programs. Ughh.. This patch has caused regressions for me and others. There must be a way to keep relocated compilers happy and ALSO not break existing setups that have been working for many years.. I'll file a PR. Thanks Greg
Re: atomic accesses
The Linux kernel, and probably some user-space applications and libraries as well, depend on GCC guaranteeing (a variant of) the following: "any access to a naturally aligned scalar object in memory that is not a bit-field will be performed by a single machine instruction whenever possible" and it seems the current compiler actually does work like this. Seems a pity to have the bit-field exception here, why is it there? Bit-fields will generally require a read-modify-write instruction, and I don't think we actually guarantee to generate one right now. Well if they do require more than one instruction, the rule has no effect ("whenever possible"). If they can be done in one instruction (as on the x86), then why not require this, why make a special case? Because current GCC doesn't work like this AFAIK. I'm aiming for a documentation-only change here, we can always extend it later. Fair enough, we don't want to document something we don't do! Does this rule extend to the use of floating-point instructions to guarantee atomic access to 64-bit long_long_integer, as written it does! Good point. Suggestions for better wording? How does "any access to a naturally aligned scalar object in memory that is not a bit-field and fits in a general purpose integer machine register, will be performed by a single machine instruction whenever possible" or "any access to a naturally aligned scalar object in memory that is not a bit-field and not bigger than a long int, will be performed by a single machine instruction whenever possible" sound? Segher
Re: birthpoints in rtl.
"Steven Bosscher" <[EMAIL PROTECTED]> writes: > On Tue, Mar 4, 2008 at 8:47 PM, Richard Sandiford > <[EMAIL PROTECTED]> wrote: >> "Steven Bosscher" <[EMAIL PROTECTED]> writes: >> Going back to something discussed upthread: would you expect to use this >> for hard regs as well as pseudos? No-op moves aren't necessarily supported >> for all hard registers. E.g. MIPS doesn't have patterns for LO <- LO, >> even though LO is a normal non-fixed register. You can also have hard >> registers that only appear in fixed patterns, such as condition code REGs. > > Yes, for hard registers you can't use this. Another example is the > loop counter register on ia64, or the flags register on i386. > > You should have a look at the history of RTL SSA for hard registers > (http://gcc.gnu.org/ml/gcc-patches/2000-07/msg01285.html and thread). > They used to put selected hard regs into SSA form. Lessons learned: > don't do that. I think the same applies to FUD chains for hard > registers. Yeah, I remember the RTL-SSA stuff. And for avoidance of doubt, I wasn't advocating explicit hard-reg moves. I was trying to figure out whether you were. Clearly you weren't. ;) >> Kenny said that pseudos-only was better than nothing, and I can't >> disagree with that. But one of the nice things about the on-the-side >> idea is that you have none of these problems. There should be nothing >> special about hard regs. > > Uh, hard registers *are* special. And, also quite important, what > would you *do* with FUD chains for hard registers? We don't optimize > many things with hard registers, usually just because it's harder to > do than for pseudos. I don't think FUD chains would change that. I don't see why hard registers are special as far as FUD chains go. We have DU chains for hard regs, so why not FUDs too? Richard
Re: plugin includes for MELT
* Basile STARYNKEVITCH wrote on Thu, Feb 28, 2008 at 06:56:35PM CET: > Ralf Wildenhues wrote: >> * Basile STARYNKEVITCH wrote on Thu, Feb 28, 2008 at 05:39:47PM CET: run-basilys.d: run-basilys.h \ $(CONFIG_H) $(SYSTEM_H) $(TIMEVAR_H) $(TM_H) $(TREE_H) $(GGC_H) \ tree-pass.h basilys.h gt-basilys.h $(CC) -MT run-basilys-deps -MMD $(ALL_CFLAGS) $(ALL_CPPFLAGS) $< >> >> The build compiler may not be gcc and may not understand -MT and -MMD. > > Yes, I know. But how can I avoid that? Use depcomp. Rather than explaining how to do that, it's easier if you wait until Tom puts the depcomp support code in the tree, then use it the same way the other stuff will. >> Wasn't there a proposal to use depcomp in gcc a while ago? > > What is depcomp exactly? A compiler wrapper that helps to do dependency computation as a side-effect of compilation. IIRC it first appeared in Automake (though its authors are also GCC developers, I'm not sure who initiated it). mkdir -p $(melt_build_include_dir); \ >> >> mkdir -p is not portable, use $(mkinstalldirs). > > Is $(mkinstalldirs) usable for non-installed directories? Yes. > (in other words, it does not do any chown or chmod? Exactly. Cheers, Ralf
Re: atomic accesses
> "Segher" == Segher Boessenkool <[EMAIL PROTECTED]> writes: Segher> Good point. Suggestions for better wording? How does Segher> "any access to a naturally aligned scalar object in memory Segher> that is not a bit-field and fits in a general purpose integer Segher> machine register, will be performed by a single machine Segher> instruction whenever possible" Segher> or Segher> "any access to a naturally aligned scalar object in memory Segher> that is not a bit-field and not bigger than a long int, will Segher> be performed by a single machine instruction whenever Segher> possible" Segher> sound? As I said before, I think any words of this form SHOULD NOT be added. All it does is add words to the documentation that provide NO guarantee of anything -- but in a way that will confuse those who don't read it carefully enough into thinking that they DID get some sort of guarantee. In other words, a statement like that has clear negative value. paul
Re: birthpoints in rtl.
On Tue, Mar 4, 2008 at 9:46 PM, Richard Sandiford <[EMAIL PROTECTED]> wrote: > I don't see why hard registers are special as far as FUD chains go. > We have DU chains for hard regs, so why not FUDs too? We have them, but does anyone use them? Does anyone actually even compute them? (Apparently fwprop does.) It all depends on what you want to do with this. If you want to make it easier to do the existing optimizations over a factored UD/DU web, then ignoring hard registers would have made sense: Easier, less expensive to compute; easier to maintain; easier to rewrite into SSA (pseudos are not shared); etc. But Kenny has already said he wants to have a full replacement for reaching defs, so the point's become moot :-) Gr. Steven
Re: birthpoints in rtl.
> I think that at this point, i have been convinced to: > > 1) use fud's rather than birthpoints because these do keep a slot for > the value along each in edge. > 2) keep the info on the side (see rsandifors diverging thread). > > I am not there on keeping extra names on the side. The advantage of > the extra names is that it gives you extra freedom. the disadvantage > is that either the transformations are more expensive or getting out of > the renamed form is expensive. The names are equivalent to UD pointers: Either you can have version names or just coinsider the destination of UD pointer to be the destination. Or am I still missing a point? > > Again, if we have a suite of contiguous converted passes in a row, i > could be swayed in the renaming on the side direction, especially if > they butted up against the rtl generation step and avoided the out of > ssa for the tree ssa. But that is a long time in the future, and i do Generating RTL and building FUD based on existing tree-SSA is doable. I am not sure how practical however. The value can be to have a means of transfering fine grained info to RTL level. Definitly not step we want to make tomorrow ;) I guess we can just stay rebuilding FUDs as we rebuild DU/UD on RTL now. It should not be any more expensive, I hope (definitly it should not have the extreme side cases as DU/UD has, I am not sure how average construction time for FUDs/SSA compare to DU/UD construction. Algorithms seems comparably complex to my eyes. Gradually we update passes to maintain the info. It is all bit slippery on RTL level since most transformation go through RTL emit machinery that is allowed to introduce fancy things, clobber registers, add temporaries and do all that stuff. I believe that FUD on hard regs is doable and practical: I don't see how the rewriting SSA problems hit by RTL-SSA project map here and overall I believe the basic disapointment lesson from RTL-SSA project was not SUBREGs/STRICT_LOW_PARTs and other isues, but the fact that RTL is that hard to modify: everything you do go through target validation machinery or expansion and can behave irregularly that does not play well with standard optimization algorithms plus there are ugly things like libcall or other notes that was a lot more important in GCC of RTL-SSA project time. So at the end adding sane analysis framework didn't let you to write easy high level transformations RTL-SSA was originally intended for. With Gimple optimizer in place, we are however not targetting this kind of stuff on RTL anymore. We want sane dataflow info for guiding basic stuff (DCE/CCP/register allocation/GCSE). Honza > not see the short term benefits. > > kenny
Re: birthpoints in rtl.
> > I think that at this point, i have been convinced to: > > > > 1) use fud's rather than birthpoints because these do keep a slot for > > the value along each in edge. > > 2) keep the info on the side (see rsandifors diverging thread). > > > > I am not there on keeping extra names on the side. The advantage of > > the extra names is that it gives you extra freedom. the disadvantage > > is that either the transformations are more expensive or getting out of > > the renamed form is expensive. > > The names are equivalent to UD pointers: Either you can have version > names or just coinsider the destination of UD pointer to be the > destination. Or am I still missing a point? ... well perhaps in better Czenglish ... Either you can have version names and build UD pointers by knowing definition points of the version or you can consider UD pointer being the version name. Honza
Re: birthpoints in rtl.
On 3/4/08 4:06 PM, Jan Hubicka wrote: The names are equivalent to UD pointers: Either you can have version names or just coinsider the destination of UD pointer to be the destination. Or am I still missing a point? Nope, that's exactly right. Versioned names are useful for some things (mostly keeping attributes/values/etc in arrays indexed by name version), but straight pointers are also doable. I believe that FUD on hard regs is doable and practical: I don't see how the rewriting SSA problems hit by RTL-SSA project map here and overall I believe the basic disapointment lesson from RTL-SSA project was not SUBREGs/STRICT_LOW_PARTs and other isues, but the fact that RTL is that hard to modify: everything you do go through target validation machinery or expansion and can behave irregularly that does not play well with standard optimization algorithms plus there are ugly things like libcall or other notes that was a lot more important in GCC of RTL-SSA project time. Yes, we should not try to do a rewriting SSA for the time being. In this we are all in agreement. Nobody is advocating a rewriting SSA form on RTL at the moment. Maybe in the future, but for now building FUD chains on the DF framework is Not Hard. DF already has support for rebuilding the UD chains. So, rebuilding FUDs will probably be straightforward and it may be a bit simpler. With FUD chains you are trading the complexity of computing dominance frontiers and (maybe) PHI pruning with the setting of more UD chains. Diego.
Re: atomic accesses
As I said before, I think any words of this form SHOULD NOT be added. All it does is add words to the documentation that provide NO guarantee of anything -- but in a way that will confuse those who don't read it carefully enough into thinking that they DID get some sort of guarantee. The idea is to _do_ provide that guarantee. If the GCC code does not agree with the GCC documentation, the code has a bug ;-) In other words, a statement like that has clear negative value. I disagree. People are relying on this undocumented GCC behaviour already, and when things break, chaos ensues. If we change this to be documented behaviour, at least it is clear where the problem lies (namely, with the compiler), and things can be fixed easily. The two big questions are: 1) Do we *want* to guarantee any behaviour in this area? 2) Exactly *what* behaviour? Segher
Re: atomic accesses
Segher Boessenkool wrote: As I said before, I think any words of this form SHOULD NOT be added. All it does is add words to the documentation that provide NO guarantee of anything -- but in a way that will confuse those who don't read it carefully enough into thinking that they DID get some sort of guarantee. The idea is to _do_ provide that guarantee. If the GCC code does not agree with the GCC documentation, the code has a bug ;-) In other words, a statement like that has clear negative value. I disagree. People are relying on this undocumented GCC behaviour already, and when things break, chaos ensues. If we change this to be documented behaviour, at least it is clear where the problem lies (namely, with the compiler), and things can be fixed easily. The two big questions are: 1) Do we *want* to guarantee any behaviour in this area? 2) Exactly *what* behaviour? This would ba a gcc extension. History does not favour such extensions: we've been unable to define them well enough, for one thing. Andrew.
Re: atomic accesses
> "Segher" == Segher Boessenkool <[EMAIL PROTECTED]> writes: >> As I said before, I think any words of this form SHOULD NOT be >> added. All it does is add words to the documentation that provide >> NO guarantee of anything -- but in a way that will confuse those >> who don't read it carefully enough into thinking that they DID get >> some sort of guarantee. Segher> The idea is to _do_ provide that guarantee. If the GCC code Segher> does not agree with the GCC documentation, the code has a bug Segher> ;-) >> In other words, a statement like that has clear negative value. Segher> I disagree. People are relying on this undocumented GCC Segher> behaviour already, and when things break, chaos ensues. If Segher> we change this to be documented behaviour, at least it is Segher> clear where the problem lies (namely, with the compiler), and Segher> things can be fixed easily. Segher> The two big questions are: Segher> 1) Do we *want* to guarantee any behaviour in this area? Segher> 2) Exactly *what* behaviour? Yes, that's the question. First of all, the text you supplied does not create any guarantee at all. It says that "whenever possible" GCC will do x. Translation: for any given bit of source, target, switches, etc., that means GCC may do x in that case -- and it also means it may NOT do x in that case. Either outcome is legal by the text you proposed. There is no bug in GCC, whether it does x or (not x). So you're not adding a guarantee. But even though it isn't a guarantee, it may cause some people to think it is one. In fact, I'm tempted to say it's doing that to you. Now, suppose we take out "whenever possible" and replace it by "always". Then it IS a guarantee, and if GCC generates multiple instructions, it's a GCC bug. (If we propose to follow this path, do we have any idea how many instances of that bug exist right now in the current code generators?) But what does such a statement guarantee? Atomic access? What exactly does "atomic access" mean? It might mean, as one of the earlier notes said, that in a single writer multiple reader setting you will only ever see the "before" or the "after" states but not the one in between. It's probably true for most architectures (perhaps even for all that GCC supports) that this limited interpretation of "atomic" is satisfied when the load or store is a single instruction, aligned, the right size, etc. Another possible interpretation of "atomic" is "if there are multiple writers, one write won't interfere with the other". For example, if one writer updates X, and another updates Y, two aligned variables adjacent in memory, the final outcome has the new X and the new Y. That interpretation in general is NOT satisfied simply by using a single instruction for store. Maybe it is on x86 -- but not necessarily so on RISC machines. So, even with the hard requirement for single instruction load/store, it isn't clear what conclusion a programmer is supposed to draw from the statement under consideration. The discussion is about atomicity. Talking about single instructions is seriously misleading, because there is only a weak connection between the two. It DOES NOT matter to a programmer whether a C assignment generates one instruction or twenty; what matters are the semantics guaranteed for that statement. If we want to have atomicity properties of plain language C constructs, let's have a statement of exactly what atomicity properties are to be guaranteed. NOT in terms of generated code, but in terms of abstract semantics. It may well be that anythe most desirable atomicity semantics are too expensive -- you'd end up with constraints similar to "volatile", or even more so. But suppose we could have a particular guarantee. Then we can see if what "people" are relying on is in fact addressed by that guarantee, or if they were expecting a stronger guarantee that they are simply NOT going to get from GCC (not unless they invoke specific atomic_foo builtins). If the former, then GCC has cured a bug in the original application (perhaps at the expense of work in GCC); if the latter, then the application bug is still there. paul
Re: atomic accesses
> AFAIK the only reason we don't break this rule is that doing so would > be grossly inefficient; there's nothing to stop any gcc back-end with > (say) seriously slow DImode writes from using two SImode writes instead. I'm fairly sure ARM already breaks this "rule". Currently it probably only effects postincrement addressing modes. However there is definite scope for splitting loads/stores (even SI->2*HI) when optimizing for size. Paul
Re: atomic accesses
Segher Boessenkool writes: >... People are relying on this undocumented GCC behaviour already, >and when things break, chaos ensues. GCC has introduced many changes over the years that have broken many programs that have relied on undocumented or unspecified behaviour. You won't find much sympathy for who people assume that GCC must behave in some way where there is no requirement for it to do so. >If we change this to be documented behaviour, at least it is clear >where the problem lies (namely, with the compiler), and things can be >fixed easily. I don't think you'll find any support for imposing a requirement on GCC that would always require it to use an "atomic" instruction when there is alternative instruction or sequence of instructions that would be faster and/or shorter. I think your best bet a long these lines would be adding __sync_fetch() and __sync_store() builtins, but doing so would be more difficult than a simple documentation change. Ross Ridge
Re: [RFC] GCC caret diagnostics
"Manuel López-Ibáñez" <[EMAIL PROTECTED]> writes: > Here is a patch that give us caret diagnostics in C/C++. There a lot > of things that can be improved but because I wanted to get some > feedback with my current approach. > > Basically, I store a pointer linebuf in the line_map structure to a > character in the input file buffer. The character corresponds to the > first character in the line corresponding to TO_LINE in the line_map > structure. The downside of this is that the buffer cannot be freed > anymore. I am not sure whether this is better than storing a duplicate > of the line as gfortran does. The third approach would be to store an > offset and when generating diagnostics, reopen the file, fseek to the > offset and print that line. > > One line_map can contain information about several lines, so we still > need to find the correct position for a line within linebuf. That is > what the hack in expand_location is for. It would be nice to have a > way to point directly to the beginning of each line: multiple pointers > per line_map? I like it. I think the general approach is fine, but I think you should free all the information when the frontend is complete--e.g., when it calls cgraph_finalize_compilation_unit. That is, only give caret warnings for diagnostics from the frontend. Ian
Re: Google Summer of Code 2008
"Doug Gregor" <[EMAIL PROTECTED]> writes: > I see that it is time to submit applications to be a mentor > organization for the Google Summer of Code. I've updated the GSoC wiki > page at: > > http://gcc.gnu.org/wiki/SummerOfCode > > with a class of projects I'm interested in; others should do the same. Thanks. I agree with Doug: please update the wiki page. > Who is responsible for actually submitting GCC's application to GSoC, > and who has done so in the past? I have done so in the past, and indeed I have already submitted an application for gcc for this year. Ian
Re: [4.3/4.4]: PATCH: PR target/35453: nmmintrin.h defines macros SIDD_XXX
"H.J. Lu" <[EMAIL PROTECTED]> writes: > Here is the patch for both gcc 4.3 and 4.4. OK for 4.3/4.4? Tested on > Linux/ia32 > and Linux/ia64 with gcc 4.3/4.4. > gcc/ > > 2008-03-03 H.J. Lu <[EMAIL PROTECTED]> > > PR target/35453 > * config/i386/smmintrin.h (SIDD_XXX): Renamed to ... > (_SIDD_XXX): This. > > gcc/testsuite/ > > 2008-03-03 H.J. Lu <[EMAIL PROTECTED]> > > PR target/35453 > * gcc.target/i386/sse4_2-pcmpestri-1.c: Replace SIDD_XXX with > _SIDD_XXX. > * gcc.target/i386/sse4_2-pcmpestri-2.c: Likewise. > * gcc.target/i386/sse4_2-pcmpestrm-1.c: Likewise. > * gcc.target/i386/sse4_2-pcmpestrm-2.c: Likewise. > * gcc.target/i386/sse4_2-pcmpistri-1.c: Likewise. > * gcc.target/i386/sse4_2-pcmpistri-2.c: Likewise. > * gcc.target/i386/sse4_2-pcmpistrm-1.c: Likewise. > * gcc.target/i386/sse4_2-pcmpistrm-2.c: Likewise. > * gcc.target/i386/sse4_2-pcmpstr.h: Likewise. This is OK for mainline. I will defer to an RM for 4.3, though my recommendation is that it should go into 4.3 if possible. Thanks. Ian
Re: [4.3/4.4]: PATCH: PR target/35453: nmmintrin.h defines macros SIDD_XXX
Ian Lance Taylor <[EMAIL PROTECTED]> writes: > This is OK for mainline. I will defer to an RM for 4.3, though my > recommendation is that it should go into 4.3 if possible. Sorry, the thread broke, and I didn't see that this had already been approved. Ian
Help with GCC on Cygwin
Hello Everyone, I am trying to do some development on the C Compiler in Cygwin and I am doing the following to build it: $ ../gcc-4.0.2/gcc/configure --prefix=/home/Balaji/Software_Tools/install --enable-languages="c" The problem i am getting is this: $ make all install TARGET_CPU_DEFAULT="" \ HEADERS="auto-host.h ansidecl.h config/i386/xm-cygwin.h" DEFINES="" \ /bin/sh ../gcc-4.0.2/gcc/mkconfig.sh config.h TARGET_CPU_DEFAULT="" \ HEADERS="config/i386/i386.h config/i386/unix.h config/i386/bsd.h config/ i386/gas.h config/dbxcoff.h config/i386/cygming.h config/i386/cygwin.h defaults. h" DEFINES="" \ /bin/sh ../gcc-4.0.2/gcc/mkconfig.sh tm.h TARGET_CPU_DEFAULT="" \ HEADERS="auto-host.h ansidecl.h config/i386/xm-cygwin.h" DEFINES="" \ /bin/sh ../gcc-4.0.2/gcc/mkconfig.sh bconfig.h /home/Balaji/Software_Tools/gcc-4.0.2/compile gcc -c -g -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-protot ypes -Wold-style-definition-DHAVE_CONFIG_H -DGENERATOR_FILE -I. -Ibuild -I.. /gcc-4.0.2/gcc -I../gcc-4.0.2/gcc/build -I../gcc-4.0.2/gcc/../include -I../gcc-4 .0.2/gcc/../libcpp/include -o build/genmodes.o ../gcc-4.0.2/gcc/genmodes.c /home/Balaji/Software_Tools/gcc-4.0.2/compile gcc -c -g -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-protot ypes -Wold-style-definition-DHAVE_CONFIG_H -DGENERATOR_FILE -I. -Ibuild -I.. /gcc-4.0.2/gcc -I../gcc-4.0.2/gcc/build -I../gcc-4.0.2/gcc/../include -I../gcc-4 .0.2/gcc/../libcpp/include -o build/errors.o ../gcc-4.0.2/gcc/errors.c make: *** No rule to make target `../build-i686-pc-cygwin/libiberty/libiberty.a' , needed by `build/genmodes.exe'. Stop. I am currently using cygwin on a x86 machine, gcc version 4.0.2 (I have to use this version...can't use a diferent one), Any help is very highly appreciated! Thanking You, Yours Sincerely, Balaji V. Iyer. PS. Here is the output I received right after I ran the configur command. checking build system type... i686-pc-cygwin checking host system type... i686-pc-cygwin checking target system type... i686-pc-cygwin checking LIBRARY_PATH variable... ok checking GCC_EXEC_PREFIX variable... ok checking whether to place generated files in the source directory... no checking whether a default linker was specified... no checking whether a default assembler was specified... no checking for gcc... gcc checking for C compiler default output file name... a.exe checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... .exe checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ANSI C... none needed checking whether gcc and cc understand -c and -o together... yes checking how to run the C preprocessor... gcc -E checking for inline... inline checking for long long int... yes checking for __int64... no checking for egrep... grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for void *... yes checking size of void *... 4 checking for short... yes checking size of short... 2 checking for int... yes checking size of int... 4 checking for long... yes checking size of long... 4 checking for long long... yes checking size of long long... 8 checking whether gcc accepts -Wno-long-long... yes checking whether gcc accepts -Wno-variadic-macros... no checking whether gcc accepts -Wold-style-definition... yes checking valgrind.h usability... no checking valgrind.h presence... no checking for valgrind.h... no checking whether make sets $(MAKE)... yes checking for gawk... gawk checking whether ln -s works... yes checking whether ln works... yes checking for ranlib... ranlib checking for a BSD compatible install... /usr/bin/install -c checking for cmp's capabilities... gnucompare checking for mktemp... yes checking for makeinfo... makeinfo checking for modern makeinfo... yes checking for recent Pod::Man... yes checking for flex... flex checking for bison... bison checking for nm... nm checking for ar... ar checking for GNU C library... no checking for ANSI C header files... (cached) yes checking whether time.h and sys/time.h may both be included... yes checking whether string.h and strings.h may both be included... yes checking for sys/wait.h that is POSIX.1 compatible... yes checking for limits.h... yes checking for stddef.h... yes checking for string.h... (cached) yes checking for strings.h... (cached) yes checking for stdlib.h... (cached) yes checking for time.h... yes checking for iconv.h... yes checking for fcntl.
Re: Help with GCC on Cygwin
"Balaji V. Iyer" <[EMAIL PROTECTED]> writes: > I am trying to do some development on the C Compiler in Cygwin and I > am doing the following to build it: gcc@gcc.gnu.org is the wrong mailing list. Please send any further e-mail to [EMAIL PROTECTED] Thanks. > $ ../gcc-4.0.2/gcc/configure Run ../gcc-4.0.2/configure, not ../gcc-4.0.2/gcc/configure. Ian
Re: static array with constant size
I'm trying to compile the following piece of code: static const int ln = 10; static int ar[ln]; I'm getting: storage size of 'ar' isn't constant size of variable 'ar' is too large Is the code legal? Can you provide me with references to its legality or a discussion about it? it seems to be compilable with MS cl.exe. Thanks
RE: Help with GCC on Cygwin
Thank you Ian. I did the modification you mentioned...now I am running into more problems. Now it is failing somewhere in libiberty.. here is the exact message (I just simply typed "make all install") (I get same messae when I just do "make") Configuring in fixincludes configure: loading cache ./config.cache checking build system type... i686-pc-cygwin checking host system type... i686-pc-cygwin checking target system type... i686-pc-cygwin checking for i686-pc-cygwin-gcc... gcc checking for C compiler default output file name... a.exe checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... .exe checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ANSI C... none needed checking how to run the C preprocessor... gcc -E checking for egrep... grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking stddef.h usability... yes checking stddef.h presence... yes checking for stddef.h... yes checking for stdlib.h... (cached) yes checking for strings.h... (cached) yes checking for unistd.h... (cached) yes checking fcntl.h usability... yes checking fcntl.h presence... yes checking for fcntl.h... yes checking sys/file.h usability... yes checking sys/file.h presence... yes checking for sys/file.h... yes checking for sys/stat.h... (cached) yes checking for clearerr_unlocked... no checking for feof_unlocked... no checking for ferror_unlocked... no checking for fflush_unlocked... no checking for fgetc_unlocked... no checking for fgets_unlocked... no checking for fileno_unlocked... no checking for fprintf_unlocked... no checking for fputc_unlocked... no checking for fputs_unlocked... no checking for fread_unlocked... no checking for fwrite_unlocked... no checking for getchar_unlocked... yes checking for getc_unlocked... yes checking for putchar_unlocked... yes checking for putc_unlocked... yes checking whether abort is declared... yes checking whether errno is declared... no checking whether clearerr_unlocked is declared... no checking whether feof_unlocked is declared... no checking whether ferror_unlocked is declared... no checking whether fflush_unlocked is declared... no checking whether fgetc_unlocked is declared... no checking whether fgets_unlocked is declared... no checking whether fileno_unlocked is declared... no checking whether fprintf_unlocked is declared... no checking whether fputc_unlocked is declared... no checking whether fputs_unlocked is declared... no checking whether fread_unlocked is declared... no checking whether fwrite_unlocked is declared... no checking whether getchar_unlocked is declared... yes checking whether getc_unlocked is declared... yes checking whether putchar_unlocked is declared... yes checking whether putc_unlocked is declared... yes checking for an ANSI C-conforming const... yes checking sys/mman.h usability... yes checking sys/mman.h presence... yes checking for sys/mman.h... yes checking for mmap... yes checking whether read-only mmap of a plain file works... yes checking whether mmap from /dev/zero works... no checking for MAP_ANON(YMOUS)... yes checking whether mmap with MAP_ANON(YMOUS) works... no checking whether to enable maintainer-specific portions of Makefiles... no updating cache ./config.cache configure: creating ./config.status config.status: creating Makefile config.status: creating mkheaders config.status: creating config.h Configuring in libiberty configure: creating cache ./config.cache checking whether to enable maintainer-specific portions of Makefiles... no checking for makeinfo... makeinfo checking for perl... perl checking build system type... i686-pc-cygwin checking host system type... i686-pc-cygwin checking for i686-pc-cygwin-ar... ar checking for i686-pc-cygwin-ranlib... ranlib checking for i686-pc-cygwin-gcc... gcc checking for C compiler default output file name... a.exe checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... .exe checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ANSI C... none needed checking how to run the C preprocessor... gcc -E checking whether gcc and cc understand -c and -o together... yes checking for an ANSI C-conforming const... yes checking for inline... inline checking whether byte ordering is bigendian... no checking for a BSD-compatible install... /usr/bin/install -c checking for sys/file.h... yes checking for sys/param.h... yes checking for limits.h... yes checking for stdlib.h... ye
Re: static array with constant size
On 3/4/08, Elazar Leibovich <[EMAIL PROTECTED]> wrote: > I'm trying to compile the following piece of code: > static const int ln = 10; > static int ar[ln]; > I'm getting: > storage size of 'ar' isn't constant > size of variable 'ar' is too large > Is the code legal? Can you provide me with references to its legality > or a discussion about it? it seems to be compilable with MS cl.exe. First, this is the wrong list, [EMAIL PROTECTED] is a better list. Second this is valid C++98/C++03 but invalid C90/C99. In C90/C99, variables are not constant integral expressions while in C++98/C++03, initialized static constant variables are. -- Pinski