Re: [RFC]: use cgraph to emit alpha vas trampoline entry point
On Dec 30, 2011, at 3:20 PM, Richard Guenther wrote: > On Sat, Dec 24, 2011 at 8:00 PM, Steven Bosscher > wrote: >> On Tue, Dec 20, 2011 at 9:46 AM, Tristan Gingold wrote: >>> Hi, >>> >>> currently alpha/vms backend emits a trampoline entry point for all nested >>> functions. This is a waste of code space, as although nested functions are >>> very common in Ada, address of nested functions are only seldom taken. >>> >>> The fact that the address of a function is taken seems only be available in >>> cgraph. Is it OK to use cgraph in alpha.c ? >> >> Since no-one has answered yet, I'll just toss in my $0.02. >> (Hold on to them a bit, they may be worth a million Euro soon :-) >> >> I think that in general you cannot rely on cgraph in the backends, >> this has to be analyzed case-by-case. In your case I'm not sure, but I >> think it should be OK. >> >> Your patch uses cgraph in alpha_start_function, which is apparently >> only used for the target hook ASM_DECLARE_FUNCTION_NAME. This hook is >> called from varasm.c:assemble_start_function(), and this is in turn >> only called from final.c:rest_of_handle_final() to generate assembly >> from RTL, and from cgraphunit.c:assemble_thunk() to output assembly >> for MI thunks. AFAICT cgraph should be correct and complete at the >> stage when those two functions are called. Therefore your patch should >> be OK. >> >> Perhaps Honza can throw in his 0.02h? > > Err - are not _all_ backends using trampolines to represent address-taken > nested functions? No. > At least I remeber to see them for x86 and plain C > nested functions as well. So - is this really a target issue? Yes. > At the time > the address of a nested fn is taken the C frontend arranges to (dynamically?) > create the trampoline via some built-in. Some backends (AIX, ia64, powerpc64-elf, VMS) don't need trampolines at all because they use function descriptors. Tristan.
Re: [PATCH] Update copyright years
> Committed to trunk as every year... You might want to add this one to your list. Applied to trunk. 2012-01-02 Eric Botcazou * gnatvsn.ads (Current_Year): Bump to 2012. -- Eric Botcazou Index: gnatvsn.ads === --- gnatvsn.ads (revision 182780) +++ gnatvsn.ads (working copy) @@ -6,7 +6,7 @@ -- -- -- S p e c -- -- -- --- Copyright (C) 1992-2011, Free Software Foundation, Inc. -- +-- Copyright (C) 1992-2012, Free Software Foundation, Inc. -- -- -- -- GNAT is free software; you can redistribute it and/or modify it under -- -- terms of the GNU General Public License as published by the Free Soft- -- @@ -92,7 +92,7 @@ package Gnatvsn is Verbose_Library_Version : constant String := "GNAT Lib v" & Library_Version; -- Version string stored in e.g. ALI files - Current_Year : constant String := "2011"; + Current_Year : constant String := "2012"; -- Used in printing copyright messages end Gnatvsn;
Patch ping
Hi! http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01539.html PR driver/48306, P2 - libiberty fix for gcc driver to find paths even when $PATH contains some gcc subdirectories http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01451.html - the passes.c and reload1.c memory leak fixes (opts-common.c already fixed) Jakub
[PATCH] Fix PR51679
Committed. Richard. 2012-01-02 Richard Guenther PR other/51679 * invoke.texi (fassociative-math): Remove spurious paranthesis. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 182780) +++ gcc/doc/invoke.texi (working copy) @@ -8169,7 +8169,7 @@ This violates the ISO C and C++ language computation result. NOTE: re-ordering may change the sign of zero as well as ignore NaNs and inhibit or create underflow or overflow (and thus cannot be used on a code which relies on rounding behavior like -@code{(x + 2**52) - 2**52)}. May also reorder floating-point comparisons +@code{(x + 2**52) - 2**52}. May also reorder floating-point comparisons and thus may not be used when ordered comparisons are required. This option requires that both @option{-fno-signed-zeros} and @option{-fno-trapping-math} be in effect. Moreover, it doesn't make
[committed] Fix g++.dg/opt/devirt2.C for mips*-linux-gnu
g++.dg/opt/devirt2.C was failing for mips64-linux-gnu because each call to xyzzy was accompanied by a R_MIPS_JALR .reloc statement that also mentioned xyzzy. This wouldn't happen if the compiler was built to use PLTs by default, so rather than look for 4 copies of xyzzy, it seemed better to add -mno-abicalls. Tested on mips64-linux-gnu and applied. Richard gcc/testsuite/ * g++.dg/opt/devirt2.C: Add -mno-abicalls for MIPS. Index: gcc/testsuite/g++.dg/opt/devirt2.C === --- gcc/testsuite/g++.dg/opt/devirt2.C 2011-12-30 17:39:32.0 + +++ gcc/testsuite/g++.dg/opt/devirt2.C 2011-12-30 17:42:30.0 + @@ -3,6 +3,8 @@ /* Using -mshort-calls avoids loading the function addresses in registers and thus getting the counts wrong. */ // { dg-additional-options "-mshort-calls" {target epiphany-*-*} } +// Using -mno-abicalls avoids a R_MIPS_JALR .reloc. +// { dg-additional-options "-mno-abicalls" { target mips*-*-* } } // { dg-final { scan-assembler-times "xyzzy" 2 { target { ! { alpha*-*-* hppa*-*-* ia64*-*-hpux* sparc*-*-* } } } } } // The IA64 and HPPA compilers generate external declarations in addition // to the call so those scans need to be more specific.
[PATCH] Fix PR51686
Committed. Richard. 2012-01-02 Richard Guenther PR bootstrap/51686 * Makefile.def (install-strip-gcc): Depend on install-strip-lto-plugin. * Makefile.in: Regenerate. Index: Makefile.def === --- Makefile.def(revision 182784) +++ Makefile.def(working copy) @@ -314,6 +314,7 @@ dependencies = { module=html-gcc; on=all dependencies = { module=install-gcc ; on=install-fixincludes; }; dependencies = { module=install-gcc ; on=install-lto-plugin; }; dependencies = { module=install-strip-gcc ; on=install-strip-fixincludes; }; +dependencies = { module=install-strip-gcc ; on=install-strip-lto-plugin; }; dependencies = { module=configure-libcpp; on=configure-libiberty; hard=true; }; dependencies = { module=configure-libcpp; on=configure-intl; }; Index: Makefile.in === --- Makefile.in (revision 182784) +++ Makefile.in (working copy) @@ -42999,6 +42999,7 @@ html-stagefeedback-gcc: maybe-all-build- install-gcc: maybe-install-fixincludes install-gcc: maybe-install-lto-plugin install-strip-gcc: maybe-install-strip-fixincludes +install-strip-gcc: maybe-install-strip-lto-plugin configure-libcpp: configure-libiberty configure-stage1-libcpp: configure-stage1-libiberty
[Fortran, committed] Require 16-byte reals for io_real_boz_[345].f90
The recent io_real_boz_[345].f90 tests were failing for 32-bit MIPS GNU/Linux, which doesn't support 16-byte reals. Fixed with the patch below. Tested on mips64-linux-gnu (where the test continue to pass for 64-bit multilibs) and applied as obvious. Richard gcc/testsuite/ * gfortran.dg/io_real_boz_3.f90: Require fortran_real_16. * gfortran.dg/io_real_boz_4.f90: Likewise. * gfortran.dg/io_real_boz_5.f90: Likewise. Index: gcc/testsuite/gfortran.dg/io_real_boz_3.f90 === --- gcc/testsuite/gfortran.dg/io_real_boz_3.f90 2011-12-30 17:47:59.0 + +++ gcc/testsuite/gfortran.dg/io_real_boz_3.f90 2011-12-30 17:48:03.0 + @@ -1,5 +1,6 @@ ! { dg-do run } ! { dg-options "-std=f2008" } +! { dg-require-effective-target fortran_real_16 } ! ! PR fortran/51407 ! Index: gcc/testsuite/gfortran.dg/io_real_boz_4.f90 === --- gcc/testsuite/gfortran.dg/io_real_boz_4.f90 2011-12-30 17:47:48.0 + +++ gcc/testsuite/gfortran.dg/io_real_boz_4.f90 2011-12-30 17:47:50.0 + @@ -1,5 +1,6 @@ ! { dg-do run } ! { dg-options "-std=f2003" } +! { dg-require-effective-target fortran_real_16 } ! ! PR fortran/51407 ! Index: gcc/testsuite/gfortran.dg/io_real_boz_5.f90 === --- gcc/testsuite/gfortran.dg/io_real_boz_5.f90 2011-12-30 17:47:24.0 + +++ gcc/testsuite/gfortran.dg/io_real_boz_5.f90 2011-12-30 17:47:40.0 + @@ -1,5 +1,6 @@ ! { dg-do run } ! { dg-options "-std=f2008" } +! { dg-require-effective-target fortran_real_16 } ! ! PR fortran/51407 !
[committed] Fix constexpr-rom.C for MIPS
MIPS uses an .rdata pseudo-op rather than an .rodata section for read-only data. Tested on mips64-linux-gnu and applied. I'm tempted to remove this bit of historical baggage now that we require fairly recent binutils, but that's for another day. Richard gcc/testsuite/ * g++.dg/cpp0x/constexpr-rom.C: Look for .rdata rather than rodata for MIPS. Index: gcc/testsuite/g++.dg/cpp0x/constexpr-rom.C === --- gcc/testsuite/g++.dg/cpp0x/constexpr-rom.C 2011-12-31 11:19:16.0 + +++ gcc/testsuite/g++.dg/cpp0x/constexpr-rom.C 2011-12-31 12:31:44.0 + @@ -1,6 +1,7 @@ // PR c++/49673: check that test_data goes into .rodata // { dg-options -std=c++0x } -// { dg-final { scan-assembler "rodata" { target { *-*-linux-gnu || *-*-elf } } } } +// { dg-final { scan-assembler "\\.rdata" { target mips*-*-* } } } +// { dg-final { scan-assembler "rodata" { target { { *-*-linux-gnu || *-*-elf } && { ! mips*-*-* } } } } } struct Data {
[committed] Increaes MIPS branch cost for gcc.dg/pr46309.c
As Jakub says in the PR, the fix for PR 46309 indirectly depends on BRANCH_COST, and the PR is still open because of that. For the time being, it seems better to avoid the failure for MIPS targets by tuning for a processor with a high branch cost. Tested on mips64-linux-gnu and applied. Richard gcc/testsuite/ * gcc.dg/pr46309.c: Add -mtune=octeon2 for MIPS. Index: gcc/testsuite/gcc.dg/pr46309.c === --- gcc/testsuite/gcc.dg/pr46309.c 2011-12-31 11:12:38.0 + +++ gcc/testsuite/gcc.dg/pr46309.c 2011-12-31 11:14:19.0 + @@ -1,6 +1,9 @@ /* PR tree-optimization/46309 */ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-reassoc-details" } */ +/* The transformation depends on BRANCH_COST being greater than 1 + (see the notes in the PR), so try to force that. */ +/* { dg-additional-options "-mtune=octeon2" { target mips*-*-* } } */ int f1 (int a)
Re: [Patch, fortran] PR fortran/50981 segmentation fault when trying to access absent elemental actual arg
Hello Mikael, Mikael Morin wrote: Regression tested on x86_64-unknown-linux-gnu. OK for 4.7/4.6/4.5[/4.4] ? OK - thanks for the comprehensive patch explanation and for the patch itself. + else + { + /* Otherwise, evaluate the argument out of the loop and pass +a reference to the value. */ + gfc_conv_expr (&se, expr); s/out of/outside/ + if (dummy_arg != NULL + && dummy_arg->sym->attr.optional + && arg->expr + && arg->expr->symtree + && arg->expr->symtree->n.sym->attr.optional + && arg->expr->ref == NULL) + newss->info->data.scalar.can_be_null_ref = true; I wonder whether one needs to take special care for the following Fortran 2008 feature: "A null pointer can be used to denote an absent nonallocatable nonpoin- ter optional argument." - I guess, one doesn't. Tobias
Re: [Patch, fortran] Improve common function elimination
Thomas Koenig wrote: --- dependency.c(Revision 182430) +++ dependency.c(Arbeitskopie) @@ -245,7 +245,9 @@ gfc_dep_compare_functions (gfc_expr *e1, gfc_expr * 0 if e1 == e2 * -1 if e1< e2 * -2 if the relationship could not be determined - * -3 if e1 /= e2, but we cannot tell which one is larger. */ + * -3 if e1 /= e2, but we cannot tell which one is larger. + REAL and COMPLEX constants are only compared for equality + or inequality; if they are unequal, -2 is returned in all cases. */ Can we finally move to an ENUM? I think we slowly drift into the realm of magic numbers ... (I know that you didn't modify that part.) Steve Kargl wrote: On Wed, Dec 28, 2011 at 04:21:55PM +0100, Thomas Koenig wrote: OK for trunk? I did not test the patch, but it appears correct to me. OK. Same here: Not tested, but it looks OK. Tobias
[committed] XFAIL gcc.target/mips/dspr2-MULT{,U}.c
I've opened PR target/51729 to track the long-standing failures of gcc.target/mips/dspr2-MULT{,U}.c. The problem used to be in the final two scan-assembler tests, but in the last few months, the -ffixed-hi and -ffixed-lo options tripped some cost-consistency checking in IRA. The original scan-assembler failures resurface if that cost checking is disabled. -ffixed-hi and -ffixed-lo are not supported options, and really, any optimisation test that relies on them is pretty weak. This patch therefore removes them and XFAILs the failing tests. Tested on mips64-linux-gnu and applied. Richard gcc/testsuite/ PR target/51729 * gcc.target/mips/dspr2-MULT.c: Remove -ffixed-hi -ffixed-lo. XFAIL. * gcc.target/mips/dspr2-MULTU.c: Likewise. Index: gcc/testsuite/gcc.target/mips/dspr2-MULT.c === --- gcc/testsuite/gcc.target/mips/dspr2-MULT.c 2012-01-02 11:04:58.0 + +++ gcc/testsuite/gcc.target/mips/dspr2-MULT.c 2012-01-02 11:28:10.0 + @@ -1,11 +1,12 @@ /* Test MIPS32 DSP REV 2 MULT instruction. Tune for a CPU that has pipelined mult. */ /* { dg-do compile } */ -/* { dg-options "-mgp32 -mdspr2 -O2 -ffixed-hi -ffixed-lo -mtune=74kc" } */ +/* { dg-options "-mgp32 -mdspr2 -O2 -mtune=74kc" } */ +/* See PR target/51729 for the reason behind the XFAILs. */ /* { dg-final { scan-assembler "\tmult\t" } } */ -/* { dg-final { scan-assembler "ac1" } } */ -/* { dg-final { scan-assembler "ac2" } } */ +/* { dg-final { scan-assembler "ac1" { xfail *-*-* } } } */ +/* { dg-final { scan-assembler "ac2" { xfail *-*-* } } } */ typedef long long a64; Index: gcc/testsuite/gcc.target/mips/dspr2-MULTU.c === --- gcc/testsuite/gcc.target/mips/dspr2-MULTU.c 2012-01-02 11:04:58.0 + +++ gcc/testsuite/gcc.target/mips/dspr2-MULTU.c 2012-01-02 11:28:22.0 + @@ -1,11 +1,12 @@ /* Test MIPS32 DSP REV 2 MULTU instruction. Tune for a CPU that has pipelined multu. */ /* { dg-do compile } */ -/* { dg-options "-mgp32 -mdspr2 -O2 -ffixed-hi -ffixed-lo -mtune=74kc" } */ +/* { dg-options "-mgp32 -mdspr2 -O2 -mtune=74kc" } */ +/* See PR target/51729 for the reason behind the XFAILs. */ /* { dg-final { scan-assembler "\tmultu\t" } } */ -/* { dg-final { scan-assembler "ac1" } } */ -/* { dg-final { scan-assembler "ac2" } } */ +/* { dg-final { scan-assembler "ac1" { xfail *-*-* } } } */ +/* { dg-final { scan-assembler "ac2" { xfail *-*-* } } } */ typedef unsigned long long a64;
Re: [patch] Fix PR tree-optimization/51624
On Sat, Dec 31, 2011 at 12:33 AM, Eric Botcazou wrote: > This is the bootstrap failure of the Ada compiler on MIPS/IRIX, a recent > regression present on mainline and 4.6 branch. The stage 2 compiler > miscompiles the stage 3 compiler (sem_type.adb:Disambiguate) because of an > oversight in the fix for PR tree-opt/50569 which changed build_ref_for_model > to > replicate chains of COMPONENT_REFs instead of just the last ones. > > The problem is that, when build_ref_for_model is called on the RHS of an > aggregate assignment of the form: > > b1.f1 = b2 > > with the model associated with a child of the LHS (say b1.f1.f2), it will > build > something like: > > MEM [&b2 + -4B].f1.f2 > > which will wreak havoc downstream. I think that the most straightforward way > out is to stop going up the chain of COMPONENT_REFs as soon as the type of the > COMPONENT_REF matches that of the base. > > Bootstrapped on MIPS/IRIX and regtested on x86_64-suse-linux. OK after a full > testing cycle? Note that you'll get ICEs whenever TYPE_CANONICAL of some aggregate type is NULL (thus, TYPE_STRUCTURAL_EQUALITY), so this seems inherently fragile (similar to using TYPE_CANONICAL == TYPE_CANONICAL here, we'd break the TYPE_STRUCTURAL_EQUALITY case again). I think that either the original fix needs to be re-thought or we need to pass the base FIELD_DECL to build_ref_for_model. That is, why not stop extracting the component-refs in do { tree field = TREE_OPERAND (expr, 1); tree cr_offset = component_ref_field_offset (expr); gcc_assert (cr_offset && host_integerp (cr_offset, 1)); offset -= TREE_INT_CST_LOW (cr_offset) * BITS_PER_UNIT; offset -= TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (field)); VEC_safe_push (tree, stack, cr_stack, expr); expr = TREE_OPERAND (expr, 0); type = TREE_TYPE (expr); } while (TREE_CODE (expr) == COMPONENT_REF); when expr == base? The whole point of course was to never need something like build_ref_for_model given that we have MEM_REFs. Thanks, Richard. > 2011-12-30 Eric Botcazou > > PR tree-optimization/51624 > * tree-sra.c (build_ref_for_model): When replicating a chain of > COMPONENT_REFs, stop as soon as the type of the COMPONENT_REF > matches that of the base. > > > -- > Eric Botcazou
[patch] PR 51006 fix bootstrap failure on NetBSD
libgcc/ChangeLog 2012-01-02 Jonathan Wakely PR bootstrap/51006 * enable-execute-stack-mprotect.c (getpagesize): Do not define for NetBSD. This removes the definition of getpagesize which is always present on BSD. Tested x86_64-unknown-netbsd5.1 and already approved privately by a NetBSD maintainer. Krister, could you confirm this is OK to commit to mainline? Index: libgcc/enable-execute-stack-mprotect.c === --- libgcc/enable-execute-stack-mprotect.c (revision 182775) +++ libgcc/enable-execute-stack-mprotect.c (working copy) @@ -1,5 +1,5 @@ /* Implement __enable_execute_stack using mprotect(2). - Copyright (C) 2011 Free Software Foundation, Inc. + Copyright (C) 2011, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -62,33 +62,6 @@ static int need_enable_exec_stack = 1; #endif -#if defined __NetBSD__ -/* Note that we go out of our way to use namespace-non-invasive calls - here. Unfortunately, there is no libc-internal name for mprotect(). */ - -#include - -extern int __sysctl (int *, unsigned int, void *, size_t *, void *, size_t); - -static int -getpagesize (void) -{ - static int size; - - if (size == 0) -{ - int mib[2]; - size_t len; - - mib[0] = CTL_HW; - mib[1] = HW_PAGESIZE; - len = sizeof (size); - (void) __sysctl (mib, 2, &size, &len, NULL, 0); -} - return size; -} -#endif /* __NetBSD__ */ - /* Attempt to turn on access permissions for the stack. Unfortunately it is not possible to make this namespace-clean.*/
Re: [patch] Fix PR tree-optimization/51624
> Note that you'll get ICEs whenever TYPE_CANONICAL of some > aggregate type is NULL (thus, TYPE_STRUCTURAL_EQUALITY), so this > seems inherently fragile (similar to using TYPE_CANONICAL == TYPE_CANONICAL > here, we'd break the TYPE_STRUCTURAL_EQUALITY case again). Could you elaborate? The patch was successfully regtested. > I think that either the original fix needs to be re-thought or we need to > pass the base FIELD_DECL to build_ref_for_model. > That is, why not stop extracting the component-refs in > > do { > tree field = TREE_OPERAND (expr, 1); > tree cr_offset = component_ref_field_offset (expr); > gcc_assert (cr_offset && host_integerp (cr_offset, 1)); > > offset -= TREE_INT_CST_LOW (cr_offset) * BITS_PER_UNIT; > offset -= TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (field)); > > VEC_safe_push (tree, stack, cr_stack, expr); > > expr = TREE_OPERAND (expr, 0); > type = TREE_TYPE (expr); > } while (TREE_CODE (expr) == COMPONENT_REF); > > when expr == base? Because SRA uses the model of the LHS to build the access on the RHS, so this kind of equalitty condition cannot work. > The whole point of course was to never need something like > build_ref_for_model given that we have MEM_REFs. MEM_REF is nice, but SRA should stop rewriting component accesses that it doesn't scalarize, this is waste of time and memory, and introduces bugs. -- Eric Botcazou
Re: [RFH / Patch] PR 20140
On Mon, Jan 2, 2012 at 3:24 AM, Paolo Carlini wrote: > On 01/02/2012 02:49 AM, Jason Merrill wrote: >> >> On 01/01/2012 08:10 PM, Paolo Carlini wrote: >>> >>> The analysis is confirmed by the fact that the rather heavy handed >>> patchlet which I'm attaching, which uses copy_node to avoid the >>> corruption, "works". How do we normally handle this kind of situation? >> >> In most cases, trees are unshared at instantiation time, but that doesn't >> apply to constants like this. The patch is OK. > > Oh I see, thanks. Then I'm going to properly test it, etc, and if everything > goes well, I'll post what I will actually commit. I think it's enough to call copy_node for CONSTANT_CLASS_P objects. > Thanks again, > Paolo.
Re: Patch ping
On Mon, Jan 2, 2012 at 11:37 AM, Jakub Jelinek wrote: > Hi! > > http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01539.html > PR driver/48306, P2 > - libiberty fix for gcc driver to find paths even when > $PATH contains some gcc subdirectories > > http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01451.html > - the passes.c and reload1.c memory leak fixes (opts-common.c > already fixed) This one is ok. Thanks, Richard. > Jakub
Re: [patch] Fix PR tree-optimization/51624
On Mon, Jan 2, 2012 at 1:06 PM, Eric Botcazou wrote: >> Note that you'll get ICEs whenever TYPE_CANONICAL of some >> aggregate type is NULL (thus, TYPE_STRUCTURAL_EQUALITY), so this >> seems inherently fragile (similar to using TYPE_CANONICAL == TYPE_CANONICAL >> here, we'd break the TYPE_STRUCTURAL_EQUALITY case again). > > Could you elaborate? The patch was successfully regtested. If you have nested structure types that are TYPE_STRUCTURAL_EQUALITY then they have TYPE_CANONICAL == NULL_TREE and types_compatible_p will return false (it ICEd in previous releases I believe) and thus you won't stop attaching COMPONENT_REFs. So the fix isn't complete. >> I think that either the original fix needs to be re-thought or we need to >> pass the base FIELD_DECL to build_ref_for_model. > >> That is, why not stop extracting the component-refs in >> >> do { >> tree field = TREE_OPERAND (expr, 1); >> tree cr_offset = component_ref_field_offset (expr); >> gcc_assert (cr_offset && host_integerp (cr_offset, 1)); >> >> offset -= TREE_INT_CST_LOW (cr_offset) * BITS_PER_UNIT; >> offset -= TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (field)); >> >> VEC_safe_push (tree, stack, cr_stack, expr); >> >> expr = TREE_OPERAND (expr, 0); >> type = TREE_TYPE (expr); >> } while (TREE_CODE (expr) == COMPONENT_REF); >> >> when expr == base? > > Because SRA uses the model of the LHS to build the access on the RHS, so this > kind of equalitty condition cannot work. Ugh. >> The whole point of course was to never need something like >> build_ref_for_model given that we have MEM_REFs. > > MEM_REF is nice, but SRA should stop rewriting component accesses that it > doesn't scalarize, this is waste of time and memory, and introduces bugs. I agree - I didn't know it does that. Do you have an example? Richard. > -- > Eric Botcazou
Re: [RFC, ARM][PATCH 5/5] Swap passes peephole2 and if_after_reload
On Fri, Dec 30, 2011 at 5:58 PM, Dmitry Melnik wrote: > After Thumb-2's peephole2 adds flag clobbering on suitable insns in order to > generate 16-bit encoding for them, if-conversion can't transform these insns > into cond_execs. In theory, if the instruction were converted to > conditional form, it would also use 16-bit encoding, so the flag clobbering > doesn't have to be added to force 16-bit encoding. > Swapping order of these passes has actually increased the number of > if-conversions (it results in 2% more it-blocks on SPEC2K INT and > 4% more "long" it-blocks that contain two or more instructions). However, > this actually caused 2038 code size regression on SPEC2K INT. At first, we > blamed this growth on the effects described in patches 1-4, but with these > fixes the regression has only reduced by 474 bytes. I.e. the more there are > conditional insns, the better is the code size reduction from patches 1-4, > but altered pass order still loses 1564 bytes to the original one. > What do you think about the order of these passes on Thumb-2? > > (The patch is obvious, and is not attached). I suppose switching them makes sense because if-conversion can create new peephole opportunities. Not sure if the reverse is also true? Richard.
Re: refine cast in collect2 for AIX, fixing bootstrap
On Fri, Dec 23, 2011 at 6:29 PM, Olivier Hainque wrote: > Hello, > > Past http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01691.html, > bootstrap still fails on AIX, from > > gcc/collect2.c:1484:25: error: to be safe all intermediate pointers in cast > from > 'char **' to 'const char **' must be 'const' qualified [-Werror=cast-qual] > gcc/collect2.c:1488:15: > > This patch fixes this by using CONST_CAST2, as seems appropriate for the > case at hand. > > Tested by checking that bootstrap proceeds and terminates after the > change on powerpc-ibm-aix5.3 > > OK ? Ok. Thanks, Richard. > Thanks in advance, > > Regards, > > Olivier > > -- > > 2011-12-23 Olivier Hainque > > * collect2.c (main): In AIX specific computations for vector > insertions, > use CONST_CAST2 to cast from char ** to const char **. >
Re: #undef fopen+freopen prior to #def in system.h, for aix bootstrap
On Fri, Dec 23, 2011 at 9:07 PM, Olivier Hainque wrote: > A minor update to provide a more precise ChangeLog: > >> * system.h: #undef fopen and freopen unconditionally. > Ok. Thanks, Richard. > 2011-12-23 Olivier Hainque > > * system.h: Prior to #define, #undef fopen and freopen unconditionally. > > libcpp/ > * system.h: Likewise. > >
Re: [RFC, ARM][PATCH 0/5] Enhancements to handling of Thumb-2 conditional insns
On Fri, Dec 30, 2011 at 08:39:51PM +0400, Dmitry Melnik wrote: > This series of patches solves few issues we found with Thumb-2 > conditional insns. These fixes include: > > Do you think some of this patches are OK for trunk? Note that none of the included patches look like stage3 material (and stage3 is going to end really soon anyway). While the patches can be reviewed now, they should be queued up until GCC 4.7 branches. Jakub
[Patch, fortran] - [OOP] type-bound operator call with non-trivial polymorphic operand
Committed as r182796. Thanks for the review Tobias. (BTW - I normally use -cp for my .diffs. I don't know what happened this time :-) ) PR46262 is completely fixed. The "lorentz" example of Ralph Kube needs the commented out allocate restoring at lorentz.f03:34. Similalrly, Arjen's 'particles' example works fine, once the inconsistency in INTENTs is fixed. The two testcases are significant demonstrations of what can now be done with gfortran. Thanks to one and all for examples and help. Cheers Paul PS An extra blank line was inadvertantly added to dump-parse-tree.c. 2012-01-02 Paul Thomas PR fortran/51529 * trans-array.c (gfc_array_allocate): Null allocated memory of newly allocted class arrays. PR fortran/46262 PR fortran/46328 PR fortran/51052 * interface.c(build_compcall_for_operator): Add a type to the expression. * trans-expr.c (conv_base_obj_fcn_val): New function. (gfc_conv_procedure_call): Use base_expr to detect non-variable base objects and, ensuring that there is a temporary variable, build up the typebound call using conv_base_obj_fcn_val. (gfc_trans_class_assign): Pick out class procedure pointer assignments and do the assignment with no further prcessing. (gfc_trans_class_array_init_assign, gfc_trans_class_init_assign gfc_trans_class_assign): Move to top of file. * gfortran.h : Add 'base_expr' field to gfc_expr. * resolve.c (get_declared_from_expr): Add 'types' argument to switch checking of derived types on or off. (resolve_typebound_generic_call): Set the new argument. (resolve_typebound_function, resolve_typebound_subroutine): Set 'types' argument for get_declared_from_expr appropriately. Identify base expression, if not a variable, in the argument list of class valued calls. Assign it to the 'base_expr' field of the final expression. Strip away all references after the last class reference. 2012-01-02 Paul Thomas PR fortran/46262 PR fortran/46328 PR fortran/51052 * gfortran.dg/typebound_operator_7.f03: New. * gfortran.dg/typebound_operator_8.f03: New.
Re: [Patch,AVR] Fix PR51345: split multilibs for SPH / no-SPH devices, Take #2
Denis Chertykov wrote: > 2011/12/19 Georg-Johann Lay : >> Joseph S. Myers wrote: >>> On Sun, 18 Dec 2011, Georg-Johann Lay wrote: >>> > This new file needs to have the standard copyright and license notices. > It's desirable to generate such notices in the output files as well. What is the right copyright for the generated files? >>> See other examples. >> There is more than one flavour if the license... From what you wrote it's >> okay >> to use GPL in all cases even if included libgcc, so I used that text. >> >> Besides printing license, there are two minor changes compared to the >> original >> patch: >> >> * lib1funcs.S defines macro SPEED_DIV that trades speed against code size. >> It uses __AVR_HAVE_8BIT_SP__ now to get better estimation of flash size. >> >> * avr-c.c use AVR_HAVE_8BIT_SP instead of avr_current_device->short_sp to >> define respective built-in macros __AVR_HAVE_8/16BIT_SP__. >> >> Ok? >> >> Johann >> >> contrib/ >>PR target/51345 >>* gcc_update (files_and_dependencies): Add >>gcc/config/avr/t-multilib, gcc/config/avr/multilib.h. >> >> libgcc/ >>PR target/51345 >>* config/avr/lib1funcs.S: Remove FIXME comments. >>(SPEED_DIV): Depend on __AVR_HAVE_8BIT_SP__. >> gcc/ >>PR target/51345 >>* config.gcc (tm_file target=avr]): Add avr/avr-multilib.h >>(tmake_file target=avr): Add avr/t-multilib. >> >>* config/avr/avr-c.c (avr_cpu_cpp_builtins): Use AVR_HAVE_8BIT_SP >>to built-in define __AVR_HAVE_8BIT_SP__, __AVR_HAVE_16BIT_SP__. >>* config/avr/genmultilib.awk: New file. >>* config/avr/t-multilib: New auto-generated file. >>* config/avr/multilib.h: New auto-generated file. >>* config/avr/t-avr (AVR_MCUS): New variable. >>(genopt.sh): Use it. >>(s-mlib): Depend on t-multilib. >>(t-multilib, multilib.h): New dependencies. >>(s-avr-mlib): New rule to build t-multilib, multilib.h from AVR_MCUS. >>(MULTILIB_OPTIONS): Remove. >>(MULTILIB_MATCHES): Remove. >>(MULTILIB_DIRNAMES): Remove. >>(MULTILIB_EXCEPTIONS): Remove: >>* config/avr/genopt.sh: Don't use hard coded file name; >>pass AVR_MCUS from t-avr instead. >> > > Approved. > > Denis. Applied with the following change. This is needed if older version of [g]awk is installed that does not accept length() on arrays. The generated files are the same. Johann diff --git a/gcc/config/avr/genmultilib.awk b/gcc/config/avr/genmultilib.awk index bbff61c..7bc3b4a 100644 --- a/gcc/config/avr/genmultilib.awk +++ b/gcc/config/avr/genmultilib.awk @@ -36,6 +36,8 @@ BEGIN { option[""] = "" tiny_stack[""] = 1 comment = 1 +n_mcu = 0 +n_cores = 0 mtiny[0] = "" mtiny[1] = "tiny-stack" @@ -110,7 +112,8 @@ BEGIN { if (core == "avr1") next - cores[length (cores)] = core + cores[n_cores] = core + n_cores++ tiny_stack[core] = 0 option[core] = "mmcu=" core @@ -122,7 +125,8 @@ BEGIN { next tiny_stack[name] = $5 -mcu[length (mcu)] = name +mcu[n_mcu] = name +n_mcu++ option[name] = "mmcu=" name toCore[name] = core @@ -157,7 +161,7 @@ END { #(avr2, avr25, ...) x mtiny-stack sep = "" -for (c = 0; c < length (cores); c++) +for (c = 0; c < n_cores; c++) { m_options = m_options sep option[cores[c]] sep = "/" @@ -175,7 +179,7 @@ END { dot_excludes = "" m_raw_sp8 = "" -for (t = 0; t < length (mcu); t++) +for (t = 0; t < n_mcu; t++) { core = toCore[mcu[t]] @@ -212,8 +216,9 @@ END { # Compose MULTILIB_DIRNAMES, MULTILIB_EXEPTIONS and avr_multilib_raw -for (t = 0; t < length (mtiny); t++) - for (c = -1; c < length (cores); c++) +n_mtiny = 2 +for (t = 0; t < n_mtiny; t++) + for (c = -1; c < n_cores; c++) { if (c == -1) core = "" @@ -255,7 +260,7 @@ END { line = mdir - for (s = 0; s < length (cores); s++) + for (s = 0; s < n_cores; s++) { if (cores[s] == core) line = line " " option[cores[s]]
Re: [PATCH] PR c++/51462 - ICE in cx_check_missing_mem_inits
The below passed bootstrap and testing, OK for trunk? Thanks. Dodji Seketeli a écrit: > Jason Merrill writes: > >> On 12/16/2011 11:40 AM, Dodji Seketeli wrote: >>> /* It's OK to skip a member with a trivial constexpr ctor. >>> A constexpr ctor that isn't trivial should have been >>> added in by now. */ >>> gcc_checking_assert (!TYPE_HAS_COMPLEX_DFLT (ftype)); >>> >>> If you think I am trying too hard, maybe I could just get out early >>> from register_constexpr_fundef if errorcount is non-zero? >> >> Let's just check errorcount in this assert. > > OK, I am currently testing the patch below. > >> >>> [1]: By the way, I am just curious, why using gcc_checking_assert >>> instead of just gcc_assert? >> >> In general, I think it makes sense to use gcc_checking_assert for >> checks that either are expensive, or check conditions that aren't >> really problematic to deal with if they do occur. But I haven't been >> particularly methodical about using one or the other. > > I see. Thanks. > > gcc/cp/ > > PR c++/51462 > * semantics.c (cx_check_missing_mem_inits): Don't assert in case > of error. > > gcc/testsuite/ > > PR c++/51462 > * g++.dg/cpp0x/constexpr-99.C: New test. > --- > gcc/cp/semantics.c|3 ++- > gcc/testsuite/g++.dg/cpp0x/constexpr-99.C | 13 + > 2 files changed, 15 insertions(+), 1 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-99.C > > diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c > index 2f2a26a..f5d34c1 100644 > --- a/gcc/cp/semantics.c > +++ b/gcc/cp/semantics.c > @@ -6021,7 +6021,8 @@ cx_check_missing_mem_inits (tree fun, tree body, bool > complain) > /* It's OK to skip a member with a trivial constexpr ctor. >A constexpr ctor that isn't trivial should have been >added in by now. */ > - gcc_checking_assert (!TYPE_HAS_COMPLEX_DFLT (ftype)); > + gcc_checking_assert (!TYPE_HAS_COMPLEX_DFLT (ftype) > +|| errorcount != 0); > continue; > } > error ("uninitialized member %qD in % constructor", > diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-99.C > b/gcc/testsuite/g++.dg/cpp0x/constexpr-99.C > new file mode 100644 > index 000..13a5ea3 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-99.C > @@ -0,0 +1,13 @@ > +// Origin PR c++/51462 > +// { dg-options "-std=c++11" } > + > +struct A > +{ > + int i = 0; > +}; > + > +struct B > +{ > + A a; > +constexpr B() : a(0) {} // { dg-error "no matching function" } > +}; > -- > 1.7.6.4 -- Dodji
[AVR,committed]: Fix: -mbranch-cost defined twice
Committed the following change because -mbranch-cost was declared twice in avr.opt: http://gcc.gnu.org/viewcvs?view=revision&revision=182798 Johann
[PATCH] Handle if (x >> 1 == 0) in VRP (PR tree-optimization/51721)
Hi! As described in the PR, this is an attempt to avoid false positive -Warray-bounds warnings by adding some extra ASSERT_EXPRs in vrp, so that it can better optimize the code. For if (x >> cst1 == cst2) we can assert that (x - ((unsigned) cst2 << cst1) < (1U << cst1)) and e.g. for if (x >> cst1 >= cst2) we can assert that (x >= (cst2 << cst1)). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-01-02 Jakub Jelinek PR tree-optimization/51721 * tree-vrp.c (register_edge_assert_for_2): If comparing lhs of right shift by constant with an integer constant, add ASSERT_EXPRs for the rhs1 of the right shift. * gcc.dg/tree-ssa/vrp63.c: New test. * gcc.dg/pr51721.c: New test. --- gcc/tree-vrp.c.jj 2011-12-02 01:52:27.0 +0100 +++ gcc/tree-vrp.c 2012-01-02 09:33:08.676635403 +0100 @@ -4464,6 +4464,93 @@ register_edge_assert_for_2 (tree name, e } } + /* Similarly add asserts for NAME == CST and NAME being defined as + NAME = NAME2 >> CST2. */ + if (TREE_CODE_CLASS (comp_code) == tcc_comparison + && TREE_CODE (val) == INTEGER_CST) +{ + gimple def_stmt = SSA_NAME_DEF_STMT (name); + tree name2 = NULL_TREE, cst2 = NULL_TREE; + tree val2 = NULL_TREE; + unsigned HOST_WIDE_INT mask[2] = { 0, 0 }; + + /* Extract CST2 from the right shift. */ + if (is_gimple_assign (def_stmt) + && gimple_assign_rhs_code (def_stmt) == RSHIFT_EXPR) + { + name2 = gimple_assign_rhs1 (def_stmt); + cst2 = gimple_assign_rhs2 (def_stmt); + if (TREE_CODE (name2) == SSA_NAME + && host_integerp (cst2, 1) + && (unsigned HOST_WIDE_INT) tree_low_cst (cst2, 1) +< 2 * HOST_BITS_PER_WIDE_INT + && INTEGRAL_TYPE_P (TREE_TYPE (name2)) + && live_on_edge (e, name2) + && !has_single_use (name2)) + { + if ((unsigned HOST_WIDE_INT) tree_low_cst (cst2, 1) + < HOST_BITS_PER_WIDE_INT) + mask[0] = ((unsigned HOST_WIDE_INT) 1 + << tree_low_cst (cst2, 1)) - 1; + else + { + mask[1] = ((unsigned HOST_WIDE_INT) 1 +<< (tree_low_cst (cst2, 1) +- HOST_BITS_PER_WIDE_INT)) - 1; + mask[0] = -1; + } + val2 = fold_binary (LSHIFT_EXPR, TREE_TYPE (val), val, cst2); + } + } + + if (val2 != NULL_TREE + && TREE_CODE (val2) == INTEGER_CST + && simple_cst_equal (fold_build2 (RSHIFT_EXPR, + TREE_TYPE (val), + val2, cst2), val)) + { + enum tree_code new_comp_code = comp_code; + tree tmp, new_val; + + tmp = name2; + if (comp_code == EQ_EXPR || comp_code == NE_EXPR) + { + if (!TYPE_UNSIGNED (TREE_TYPE (val))) + { + unsigned int prec = TYPE_PRECISION (TREE_TYPE (val)); + tree type = build_nonstandard_integer_type (prec, 1); + tmp = build1 (NOP_EXPR, type, name2); + val2 = fold_convert (type, val2); + } + tmp = fold_build2 (MINUS_EXPR, TREE_TYPE (tmp), tmp, val2); + new_val = build_int_cst_wide (TREE_TYPE (tmp), mask[0], mask[1]); + new_comp_code = comp_code == EQ_EXPR ? LE_EXPR : GT_EXPR; + } + else if (comp_code == LT_EXPR || comp_code == GE_EXPR) + new_val = val2; + else + { + new_val = build_int_cst_wide (TREE_TYPE (val2), + mask[0], mask[1]); + new_val = fold_binary (BIT_IOR_EXPR, TREE_TYPE (val2), +val2, new_val); + } + + if (dump_file) + { + fprintf (dump_file, "Adding assert for "); + print_generic_expr (dump_file, name2, 0); + fprintf (dump_file, " from "); + print_generic_expr (dump_file, tmp, 0); + fprintf (dump_file, "\n"); + } + + register_new_assert_for (name2, tmp, new_comp_code, new_val, + NULL, e, bsi); + retval = true; + } +} + return retval; } --- gcc/testsuite/gcc.dg/tree-ssa/vrp63.c.jj2012-01-02 10:47:57.972533062 +0100 +++ gcc/testsuite/gcc.dg/tree-ssa/vrp63.c 2012-01-02 10:47:27.0 +0100 @@ -0,0 +1,345 @@ +/* PR tree-optimization/51721 */ +/* { dg-do link } */ +/* { dg-options "-O2" } */ + +extern void link_error (void); + +#define BITSM1 (sizeof (int) * __CHAR_BIT__ - 1) + +void +f1 (unsigned int s) +{ + if (s >> 1 == 0) +{ + if (s == 2 || s == -1U) + link_error (); +} + else +{ + if (s == 0 || s == 1) + link_error (); +} +}
Re: [PATCH] PR c++/51462 - ICE in cx_check_missing_mem_inits
OK. Jason
Re: [RFH / Patch] PR 20140
Hi, On Mon, Jan 2, 2012 at 3:24 AM, Paolo Carlini wrote: On 01/02/2012 02:49 AM, Jason Merrill wrote: On 01/01/2012 08:10 PM, Paolo Carlini wrote: The analysis is confirmed by the fact that the rather heavy handed patchlet which I'm attaching, which uses copy_node to avoid the corruption, "works". How do we normally handle this kind of situation? In most cases, trees are unshared at instantiation time, but that doesn't apply to constants like this. The patch is OK. Oh I see, thanks. Then I'm going to properly test it, etc, and if everything goes well, I'll post what I will actually commit. I think it's enough to call copy_node for CONSTANT_CLASS_P objects. In this branch we are always dealing with STRING_CST as init, I don't think we are going to discriminate much by using that predicate. Or maybe I'm not getting your point. Anyway, in the meanwhile I investigated the issue somewhat more, and my tentative fix indeed works but it's definitely a too big hammer in terms of memory management and memcpy for no reason: we would copy_node very often, also when a plain char array is initialized with a plain char string literal, or a wchar_t from a wchar_t. Bad, because currently many circumstances in which we do TREE_TYPE (init) = type are in fact benign: char - char and wchar_t - wchar_t definitely work unconditionally (and within an individual instantiation we don't have problems, only between instantiations). Thus I'm thinking the below would be better (and safe), copy_node should trigger much less often, essentially, only when an explicitly unsigned / signed char array is involved. Tested x86_64-linux. Thanks, Paolo. / /cp 2012-01-02 Paolo Carlini PR c++/20140 * typeck2.c (digest_init_r): Use copy_init when initializing unsigned and signed char arrays. /testsuite 2012-01-02 Paolo Carlini PR c++/20140 * g++.dg/template/init9.C: New. Index: testsuite/g++.dg/template/init9.C === --- testsuite/g++.dg/template/init9.C (revision 0) +++ testsuite/g++.dg/template/init9.C (revision 0) @@ -0,0 +1,12 @@ +// PR c++/20140 + +template void foo() +{ + unsigned char s[] = ""; +} + +int main() +{ + foo(); + foo(); +} Index: cp/typeck2.c === --- cp/typeck2.c(revision 182784) +++ cp/typeck2.c(working copy) @@ -902,6 +902,8 @@ digest_init_r (tree type, tree init, bool nested, } } + if (char_type != typ1) + init = copy_node (init); TREE_TYPE (init) = type; if (TYPE_DOMAIN (type) != 0 && TREE_CONSTANT (TYPE_SIZE (type))) {
[PATCH] Fix code quality regression caused by my gather vectorization patch
Hi! I've noticed that my gather vectorization patch unfortunately regressed code quality of gcc.target/i386/avx2-i64gatherd256-2.c gcc.target/i386/avx2-i64gatherd256-3.c gcc.target/i386/avx2-i64gatherd256-4.c gcc.target/i386/avx2-i64gatherps256-3.c gcc.target/i386/avx2-i64gatherps256-4.c tests. The problem is that after the unification of the gather auto-vectorization and gather intrinsics nothing optimizes well the new vec_select of the first half of gather pattern's result, while the vec_select is a nop, register allocation often chooses to allocate the gather pattern result in a different vector register from the result of the following extraction of first half of it. This patch fixes the regression by adding two patterns for combiner. On some of the above tests it saves 2 instructions, one others one. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-01-02 Jakub Jelinek * config/i386/sse.md (*avx2_gatherdi_3, *avx2_gatherdi_4): New patterns. --- gcc/config/i386/sse.md.jj 2011-11-07 20:32:09.0 +0100 +++ gcc/config/i386/sse.md 2011-11-07 20:52:54.0 +0100 @@ -12652,3 +12652,49 @@ (define_insn "*avx2_gatherdi_2" [(set_attr "type" "ssemov") (set_attr "prefix" "vex") (set_attr "mode" "")]) + +(define_insn "*avx2_gatherdi_3" + [(set (match_operand: 0 "register_operand" "=&x") + (vec_select: + (unspec:VI4F_256 + [(match_operand: 2 "register_operand" "0") +(match_operator: 7 "vsib_mem_operator" + [(unspec:P + [(match_operand:P 3 "vsib_address_operand" "p") + (match_operand: 4 "register_operand" "x") + (match_operand:SI 6 "const1248_operand" "n")] + UNSPEC_VSIBADDR)]) +(mem:BLK (scratch)) +(match_operand: 5 "register_operand" "1")] +UNSPEC_GATHER) + (parallel [(const_int 0) (const_int 1) +(const_int 2) (const_int 3)]))) + (clobber (match_scratch:VI4F_256 1 "=&x"))] + "TARGET_AVX2" + "vgatherq\t{%5, %7, %0|%0, %7, %5}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "vex") + (set_attr "mode" "")]) + +(define_insn "*avx2_gatherdi_4" + [(set (match_operand: 0 "register_operand" "=&x") + (vec_select: + (unspec:VI4F_256 + [(pc) +(match_operator: 6 "vsib_mem_operator" + [(unspec:P + [(match_operand:P 2 "vsib_address_operand" "p") + (match_operand: 3 "register_operand" "x") + (match_operand:SI 5 "const1248_operand" "n")] + UNSPEC_VSIBADDR)]) +(mem:BLK (scratch)) +(match_operand: 4 "register_operand" "1")] + UNSPEC_GATHER) + (parallel [(const_int 0) (const_int 1) +(const_int 2) (const_int 3)]))) + (clobber (match_scratch:VI4F_256 1 "=&x"))] + "TARGET_AVX2" + "vgatherq\t{%4, %6, %0|%0, %6, %4}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "vex") + (set_attr "mode" "")]) Jakub
Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)
Ayal Zaks writes: > + for (i = 0; i < ira_pressure_classes_num; i++) > +{ > + enum reg_class pressure_class; > + > + pressure_class = ira_pressure_classes[i]; > + > + if (max_reg_pressure[pressure_class] == 0) > + continue; > + > + if (dump_file) > + fprintf (dump_file, "%s=%d %d ", reg_class_names[pressure_class], > + max_reg_pressure[pressure_class], > + ira_available_class_regs[pressure_class]); > + > + if (max_reg_pressure[pressure_class] > + > ira_class_hard_regs_num[pressure_class]) > + { > + if (dump_file) > + fprintf (dump_file, "(pressure)\n"); > + > + pressure_p = true; > > you can "break;" now. FWIW, I thought the same thing at first, but I think Revital's way is better. It's nice to know when looking at dumps whether there is excess pressure in more than one class. This isn't performance-critical code after all. > however, you have everything setup to compute the amount of spill, so > suggest to do the following instead: > > pressure += max_reg_pressure[pressure_class] > - ira_class_hard_regs_num[pressure_class]); > > or better call the variable "spillage" instead of "pressure", and > return it. Current use will only check if it's zero or positive. I read this suggestion in the same way as Revital seems to have done: that we sum the pressure change over all classes. But that isn't the idea. Using N too many registers in one pressure class cannot be mitigated by leaving N registers in another pressure class unused. TBH, the original version of this function looked better to me. > Future use, as discussed offline, should compare this to the amount of > spillage the original loop (w/o modulo-scheduling) is expected to > have. I wonder if that's really worth pursuing though. For one thing, we've no control over where the spill code is going to be inserted, so the final ii is going to be a little arbitrary. That's less of a problem without SMS because we're able to reschedule the code after register allocation in order to move spills around. For another thing, the pressure of the original (unscheduled) code isn't necessarily indicative of what can be achieved by the normal scheduler with something like -fsched-pressure. We can't assume that the original block has minimum pressure. If we're serious about wanting to use SMS in high-pressure loops, perhaps we should consider trying to split live ranges in SMS itself. I'm not sure it makes sense to leave an SMSed loop that we "know" is going to need spill code. Richard
[PATCH] Fix PR51650
This fixes various instances of ICEs in dwarf2out_finish when processing the limbo DIE list. With LTO we can end up with limbos with RECORD_TYPE context for example when we partitioned away record-local statics and nothing references the record type itself. In general it seems safe to not expose LTO (with it's semi-broken debug information support) to this kind of sanity check in limbo DIE processing (I've done some real fixes in this area lately, but some are really hard to tackle if we do not want to make partitioning dependent on dwarf2out internals - we feed dwarf2out with types/decls in sometimes unpredicted order). Another kind of fix that crossed my mind would be iterating over the limbo list repeatedly and emitting origins there on-the-fly, but not sure if we'd not run into more issues that way. LTO bootstrapped and tested on x86_64-unknown-linux-gnu. This seems to be the last ICE caused by a LTO mozilla build with -g. Ok for trunk? Thanks, Richard. 2012-01-02 Richard Guenther PR debug/51650 * dwarf2out.c (dwarf2out_finish): When processing LTO IL allow any kind of parents for limbo DIEs. Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c (revision 182784) +++ gcc/dwarf2out.c (working copy) @@ -22501,15 +22501,22 @@ dwarf2out_finish (const char *filename) else if (TYPE_P (node->created_for)) context = TYPE_CONTEXT (node->created_for); - gcc_assert (context - && (TREE_CODE (context) == FUNCTION_DECL - || TREE_CODE (context) == NAMESPACE_DECL)); + /* With LTO we can end up with limbo DIEs with parents +we never output due to partitioning being done +unaware of things like class-scope statics or functions. */ + gcc_assert (in_lto_p + || (context + && (TREE_CODE (context) == FUNCTION_DECL + || TREE_CODE (context) == NAMESPACE_DECL))); - origin = lookup_decl_die (context); + if (context) + origin = lookup_decl_die (context); + else + origin = comp_unit_die (); if (origin) - add_child_die (origin, die); + add_child_die (origin, die); else - add_child_die (comp_unit_die (), die); + add_child_die (comp_unit_die (), die); } } }
Re: [PATCH] Fix code quality regression caused by my gather vectorization patch
On Mon, Jan 2, 2012 at 2:29 PM, Jakub Jelinek wrote: > I've noticed that my gather vectorization patch unfortunately regressed code > quality of > gcc.target/i386/avx2-i64gatherd256-2.c > gcc.target/i386/avx2-i64gatherd256-3.c > gcc.target/i386/avx2-i64gatherd256-4.c > gcc.target/i386/avx2-i64gatherps256-3.c > gcc.target/i386/avx2-i64gatherps256-4.c > tests. The problem is that after the unification of the gather > auto-vectorization and gather intrinsics nothing optimizes well the > new vec_select of the first half of gather pattern's result, while the > vec_select is a nop, register allocation often chooses to allocate the > gather pattern result in a different vector register from the result of > the following extraction of first half of it. > This patch fixes the regression by adding two patterns for combiner. > On some of the above tests it saves 2 instructions, one others one. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2012-01-02 Jakub Jelinek > > * config/i386/sse.md (*avx2_gatherdi_3, *avx2_gatherdi_4): > New patterns. OK. Thanks, Uros.
[committed] Tweak MIPS STORE_BY_PIECES_P
Back in: http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00944.html MIPS was changed to prefer block copies over store_by_pieces for all sizes of constant. I was a little doubtful about this change at the time, and perhaps I should have been less ready to accept it. The current failure of gfortran.dg/pr45636.f90 on mips64-linux-gnu made me look at it again. The patch below relaxes STORE_BY_PIECES_P in cases where I'm pretty confident it's usually (though not always) a win. I've continued to stick to Nigel's preference for block moves in the cases where the trade-offs are less clear. This can be relaxed further if people find the need. I had to make the memcpy-4.c test (which is effectively MIPS-specific) use a longer copy. As things stand, this longer copy gets implemented in a different way on 32-bit and 64-bit targets, so I also changed the test to require a minimum of two "mem/s/u"s, rather than a specific number. (64-bit hosts should probably use the same SWL/SWR sequence as 32-bit hosts, but that's a separate change.) This doesn't change the intent of the test, which was simply to make sure that constness didn't get lost during expand. Also, I had to make move_by_pieces_ninsns public, but that bit seemed obvious since the macros were already making use of it. Tested on mips64-linux-gnu and applied. Richard gcc/ * expr.h (move_by_pieces_ninsns): Declare. * expr.c (move_by_pieces_ninsns): Make external. * config/mips/mips-protos.h (mips_move_by_pieces_p): Declare. (mips_store_by_pieces_p): Likewise. * config/mips/mips.h (MOVE_BY_PIECES_P): Call mips_move_by_pieces_p. (STORE_BY_PIECES_P): Likewise mips_store_by_pieces_p. * config/mips/mips.c (mips_move_by_pieces_p): New function. (mips_store_by_pieces_p): Likewise. gcc/testsuite/ * gcc.dg/memcpy-4.c: Add nomips16 attribute for MIPS targets. Increase copy to 5 bytes. Look for at least two "mem/s/u"s, rather than a specific number. Index: gcc/expr.h === --- gcc/expr.h 2012-01-02 11:01:00.0 + +++ gcc/expr.h 2012-01-02 11:39:12.0 + @@ -367,6 +367,10 @@ extern bool set_storage_via_setmem (rtx, succeed. */ extern int can_move_by_pieces (unsigned HOST_WIDE_INT, unsigned int); +extern unsigned HOST_WIDE_INT move_by_pieces_ninsns (unsigned HOST_WIDE_INT, +unsigned int, +unsigned int); + /* Return nonzero if it is desirable to store LEN bytes generated by CONSTFUN with several move instructions by store_by_pieces function. CONSTFUNDATA is a pointer which will be passed as argument Index: gcc/expr.c === --- gcc/expr.c 2012-01-02 11:01:00.0 + +++ gcc/expr.c 2012-01-02 11:39:12.0 + @@ -123,9 +123,6 @@ struct store_by_pieces_d int reverse; }; -static unsigned HOST_WIDE_INT move_by_pieces_ninsns (unsigned HOST_WIDE_INT, -unsigned int, -unsigned int); static void move_by_pieces_1 (rtx (*) (rtx, ...), enum machine_mode, struct move_by_pieces_d *); static bool block_move_libcall_safe_for_call_parm (void); @@ -1016,7 +1013,7 @@ move_by_pieces (rtx to, rtx from, unsign /* Return number of insns required to move L bytes by pieces. ALIGN (in bits) is maximum alignment we can assume. */ -static unsigned HOST_WIDE_INT +unsigned HOST_WIDE_INT move_by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, unsigned int max_size) { Index: gcc/config/mips/mips-protos.h === --- gcc/config/mips/mips-protos.h 2012-01-02 11:01:00.0 + +++ gcc/config/mips/mips-protos.h 2012-01-02 11:39:12.0 + @@ -239,6 +239,8 @@ extern void mips_split_call (rtx, rtx); extern bool mips_get_pic_call_symbol (rtx *, int); extern void mips_expand_fcc_reload (rtx, rtx, rtx); extern void mips_set_return_address (rtx, rtx); +extern bool mips_move_by_pieces_p (unsigned HOST_WIDE_INT, unsigned int); +extern bool mips_store_by_pieces_p (unsigned HOST_WIDE_INT, unsigned int); extern bool mips_expand_block_move (rtx, rtx, rtx); extern void mips_expand_synci_loop (rtx, rtx); Index: gcc/config/mips/mips.h === --- gcc/config/mips/mips.h 2012-01-02 11:01:00.0 + +++ gcc/config/mips/mips.h 2012-01-02 11:39:12.0 + @@ -2782,23 +2782,8 @@ #define MOVE_RATIO(speed) \ ? MIPS_MAX_MOVE_BYTES_STRAIGHT / MOVE_MAX \ : MIPS_CALL_RATIO / 2) -/* movmemsi is meant to generate code that is at least as good
[committed] Add missing earlyclobber to loadgp_newabi_
g++.dg/torture/pr37922.C -O3 -fomit-frame-pointer -funroll-loops was failing for n32 and n64 on mips64-linux-gnu because we were renaming the global pointer register to $25, even though $25 is needed by the sequence that sets up the global pointer. This in turn was caused by a missing earlyclobber in the loadgp_newabi_ pattern. Bit of a brown-paper-bag bug, sorry. Will backport it to the release branches soon. Richard gcc/ * config/mips/mips.md (loadgp_newabi_): Add missing earlyclobber. Index: gcc/config/mips/mips.md === --- gcc/config/mips/mips.md 2011-12-31 11:43:20.0 + +++ gcc/config/mips/mips.md 2011-12-31 11:43:28.0 + @@ -4838,7 +4838,7 @@ (define_expand "load_const_gp_" ;; of _gp from the start of this function. Operand 1 is the incoming ;; function address. (define_insn_and_split "loadgp_newabi_" - [(set (match_operand:P 0 "register_operand" "=d") + [(set (match_operand:P 0 "register_operand" "=&d") (unspec:P [(match_operand:P 1) (match_operand:P 2 "register_operand" "d")] UNSPEC_LOADGP))]
Ping^2: Fixing liveness information during prepare_shrink_wrap
Ping for this prepare_shrink_wrap patch: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00274.html It fixes some wrong-code regressions on MIPS16. Richard
Ping^2: Out-of-order update of new_spill_reg_store[]
Ping for this reload patch: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00266.html or perhaps the original: http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00663.html which fixes some wrong-code regressions on mips64-linux-gnu. Richard
Re: [Patch, fortran] Improve common function elimination
Hi Tobias and Steve, Same here: Not tested, but it looks OK. Committed (after Steve's mail already). Thanks a lot for the reviews! Thomas
[PATCH] Fix PR51730
I am testing the following patch to fix PR51730. Richard. 2012-01-02 Richard Guenther PR middle-end/51730 * fold-const.c (fold_comparison): Properly canonicalize tree offset and HOST_WIDE_INT bit position. * gcc.dg/fold-compare-6.c: New testcase. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 182784) +++ gcc/fold-const.c(working copy) @@ -8886,6 +8886,14 @@ fold_comparison (location_t loc, enum tr indirect_base0 = true; } offset0 = TREE_OPERAND (arg0, 1); + if (host_integerp (offset0, 0) + && ((HOST_WIDE_INT) (TREE_INT_CST_LOW (offset0) * BITS_PER_UNIT) + / BITS_PER_UNIT + == (HOST_WIDE_INT) TREE_INT_CST_LOW (offset0))) + { + bitpos0 = TREE_INT_CST_LOW (offset0) * BITS_PER_UNIT; + offset0 = NULL_TREE; + } } base1 = arg1; @@ -8909,6 +8917,14 @@ fold_comparison (location_t loc, enum tr indirect_base1 = true; } offset1 = TREE_OPERAND (arg1, 1); + if (host_integerp (offset1, 0) + && ((HOST_WIDE_INT) (TREE_INT_CST_LOW (offset1) * BITS_PER_UNIT) + / BITS_PER_UNIT + == (HOST_WIDE_INT) TREE_INT_CST_LOW (offset1))) + { + bitpos1 = TREE_INT_CST_LOW (offset1) * BITS_PER_UNIT; + offset1 = NULL_TREE; + } } /* A local variable can never be pointed to by Index: gcc/testsuite/gcc.dg/fold-compare-6.c === --- gcc/testsuite/gcc.dg/fold-compare-6.c (revision 0) +++ gcc/testsuite/gcc.dg/fold-compare-6.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-original" } */ + +char digs[] = "0123456789"; +int foo (void) +{ + int xlcbug = 1 / (&(digs + 5)[-2 + (_Bool) 1] == &digs[4] ? 1 : -1); + return xlcbug; +} + +/* { dg-final { scan-tree-dump "xlcbug = 1;" "original" } } */ +/* { dg-final { cleanup-tree-dump "original" } } */
Re: [PATCH] Fix PR51650
It seems like using get_context_die instead of lookup_decl_die might do the trick. Jason
Re: [RFH / Patch] PR 20140
On 01/02/2012 08:25 AM, Paolo Carlini wrote: Bad, because currently many circumstances in which we do TREE_TYPE (init) = type are in fact benign: I don't think they are benign. Even when types are equivalent, we need to convert between them, as several things in the back end rely on pointer equality, and we don't want different instantiations changing the type in subtly different ways. Jason
Re: [RFH / Patch] PR 20140
On 01/02/2012 04:06 PM, Jason Merrill wrote: On 01/02/2012 08:25 AM, Paolo Carlini wrote: Bad, because currently many circumstances in which we do TREE_TYPE (init) = type are in fact benign: I don't think they are benign. Even when types are equivalent, we need to convert between them, as several things in the back end rely on pointer equality, and we don't want different instantiations changing the type in subtly different ways. Ok. Then yesterday's patch certainly works and you already approved. The below is what I'm going to apply. Thanks, Paolo. /cp 2012-01-02 Paolo Carlini PR c++/20140 * typeck2.c (digest_init_r): Use copy_init when initializing an array of chars. /testsuite 2012-01-02 Paolo Carlini PR c++/20140 * g++.dg/template/init9.C: New. Index: testsuite/g++.dg/template/init9.C === --- testsuite/g++.dg/template/init9.C (revision 0) +++ testsuite/g++.dg/template/init9.C (revision 0) @@ -0,0 +1,12 @@ +// PR c++/20140 + +template void foo() +{ + unsigned char s[] = ""; +} + +int main() +{ + foo(); + foo(); +} Index: cp/typeck2.c === --- cp/typeck2.c(revision 182784) +++ cp/typeck2.c(working copy) @@ -902,7 +902,11 @@ digest_init_r (tree type, tree init, bool nested, } } - TREE_TYPE (init) = type; + if (type != TREE_TYPE (init)) + { + init = copy_node (init); + TREE_TYPE (init) = type; + } if (TYPE_DOMAIN (type) != 0 && TREE_CONSTANT (TYPE_SIZE (type))) { int size = TREE_INT_CST_LOW (TYPE_SIZE (type));
C++ PATCH for c++/51666 (comma in NSDMI)
As discussed in the audit trail, this is DR 325 as it affects NSDMI rather than default arguments. So to fix it I re-used the same code for dealing with the default argument case. Tested x86_64-pc-linux-gnu, applying to trunk. commit bd815e955cfe15f7ecd36366bc79062278e8f2e5 Author: Jason Merrill Date: Sun Jan 1 23:44:24 2012 -0500 PR c++/51666 * parser.c (cp_parser_cache_defarg): Split out... (cp_parser_parameter_declaration): ...from here. (cp_parser_save_nsdmi): Use it. (cp_parser_cache_group): Remove CPP_COMMA support. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 0f5bb8e..7e6915c 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -2249,6 +2249,8 @@ static void cp_parser_pre_parsed_nested_name_specifier (cp_parser *); static bool cp_parser_cache_group (cp_parser *, enum cpp_ttype, unsigned); +static tree cp_parser_cache_defarg + (cp_parser *parser, bool nsdmi); static void cp_parser_parse_tentatively (cp_parser *); static void cp_parser_commit_to_tentative_parse @@ -17267,159 +17269,18 @@ cp_parser_parameter_declaration (cp_parser *parser, /* If the next token is `=', then process a default argument. */ if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)) { + token = cp_lexer_peek_token (parser->lexer); /* If we are defining a class, then the tokens that make up the default argument must be saved and processed later. */ if (!template_parm_p && at_class_scope_p () && TYPE_BEING_DEFINED (current_class_type) && !LAMBDA_TYPE_P (current_class_type)) - { - unsigned depth = 0; - int maybe_template_id = 0; - cp_token *first_token; - cp_token *token; - - /* Add tokens until we have processed the entire default - argument. We add the range [first_token, token). */ - first_token = cp_lexer_peek_token (parser->lexer); - while (true) - { - bool done = false; - - /* Peek at the next token. */ - token = cp_lexer_peek_token (parser->lexer); - /* What we do depends on what token we have. */ - switch (token->type) - { - /* In valid code, a default argument must be - immediately followed by a `,' `)', or `...'. */ - case CPP_COMMA: - if (depth == 0 && maybe_template_id) - { - /* If we've seen a '<', we might be in a - template-argument-list. Until Core issue 325 is - resolved, we don't know how this situation ought - to be handled, so try to DTRT. We check whether - what comes after the comma is a valid parameter - declaration list. If it is, then the comma ends - the default argument; otherwise the default - argument continues. */ - bool error = false; - tree t; - - /* Set ITALP so cp_parser_parameter_declaration_list - doesn't decide to commit to this parse. */ - bool saved_italp = parser->in_template_argument_list_p; - parser->in_template_argument_list_p = true; - - cp_parser_parse_tentatively (parser); - cp_lexer_consume_token (parser->lexer); - begin_scope (sk_function_parms, NULL_TREE); - cp_parser_parameter_declaration_list (parser, &error); - for (t = current_binding_level->names; t; t = DECL_CHAIN (t)) - pop_binding (DECL_NAME (t), t); - leave_scope (); - if (!cp_parser_error_occurred (parser) && !error) - done = true; - cp_parser_abort_tentative_parse (parser); - - parser->in_template_argument_list_p = saved_italp; - break; - } - case CPP_CLOSE_PAREN: - case CPP_ELLIPSIS: - /* If we run into a non-nested `;', `}', or `]', - then the code is invalid -- but the default - argument is certainly over. */ - case CPP_SEMICOLON: - case CPP_CLOSE_BRACE: - case CPP_CLOSE_SQUARE: - if (depth == 0) - done = true; - /* Update DEPTH, if necessary. */ - else if (token->type == CPP_CLOSE_PAREN - || token->type == CPP_CLOSE_BRACE - || token->type == CPP_CLOSE_SQUARE) - --depth; - break; - - case CPP_OPEN_PAREN: - case CPP_OPEN_SQUARE: - case CPP_OPEN_BRACE: - ++depth; - break; - - case CPP_LESS: - if (depth == 0) - /* This might be the comparison operator, or it might - start a template argument list. */ - ++maybe_template_id; - break; - -case CPP_RSHIFT: - if (cxx_dialect == cxx98) -break; - /* Fall through for C++0x, which treats the `>>' - operator like two `>' tokens in certain - cases. */ - - case CPP_GREATER: - if (depth == 0) - { - /* This might be an operator, or it might close a - template argument list. But if a previous '<' - started a template argument list, this will have - closed it, so we can't be in one anymore. */ - maybe_template_id -= 1 + (token->type == CPP_RSHIFT); - if (maybe_template_id < 0) - maybe_template_id = 0; - } - break; - - /*
Re: [PATCH] Fix PR51650
On Mon, 2 Jan 2012, Jason Merrill wrote: > It seems like using get_context_die instead of lookup_decl_die might do the > trick. For the idea creating the DIE at the point we process the limbo DIE yes. But would you consider doing that unconditional or only for LTO? I can certainly give it a shot. Just to check, do you mean the following? Thanks, Richard. Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c (revision 182784) +++ gcc/dwarf2out.c (working copy) @@ -22501,15 +22501,8 @@ dwarf2out_finish (const char *filename) else if (TYPE_P (node->created_for)) context = TYPE_CONTEXT (node->created_for); - gcc_assert (context - && (TREE_CODE (context) == FUNCTION_DECL - || TREE_CODE (context) == NAMESPACE_DECL)); - - origin = lookup_decl_die (context); - if (origin) - add_child_die (origin, die); - else - add_child_die (comp_unit_die (), die); + origin = get_context_die (context); + add_child_die (origin, die); } } }
[committed] copy-edit -flto documentation
When I was looking at some LTO-related test failures last week (see mail on gcc@), I found that the many punctuation and grammatical errors in the -flto documentation made this section of the GCC manual hard to read. I made a pass over it to clean up the worst problems -- since the changes don't affect content, I've checked in this patch as "obvious". -Sandra 2012-01-02 Sandra Loosemore gcc/ * doc/invoke.texi (-flto and related options): Copy-edit. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 182710) +++ gcc/doc/invoke.texi (working copy) @@ -7763,8 +7763,8 @@ file. When the object files are linked bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit. -To use the link-timer optimizer, @option{-flto} needs to be specified at -compile time and during the final link. For example, +To use the link-time optimizer, @option{-flto} needs to be specified at +compile time and during the final link. For example: @smallexample gcc -c -O2 -flto foo.c @@ -7772,25 +7772,25 @@ gcc -c -O2 -flto bar.c gcc -o myprog -flto -O2 foo.o bar.o @end smallexample -The first two invocations to GCC will save a bytecode representation +The first two invocations to GCC save a bytecode representation of GIMPLE into special ELF sections inside @file{foo.o} and -@file{bar.o}. The final invocation will read the GIMPLE bytecode from -@file{foo.o} and @file{bar.o}, merge the two files into a single -internal image, and compile the result as usual. Since both +@file{bar.o}. The final invocation reads the GIMPLE bytecode from +@file{foo.o} and @file{bar.o}, merges the two files into a single +internal image, and compiles the result as usual. Since both @file{foo.o} and @file{bar.o} are merged into a single image, this -causes all the inter-procedural analyses and optimizations in GCC to +causes all the interprocedural analyses and optimizations in GCC to work across the two files as if they were a single one. This means, -for example, that the inliner will be able to inline functions in +for example, that the inliner is able to inline functions in @file{bar.o} into functions in @file{foo.o} and vice-versa. -Another (simpler) way to enable link-time optimization is, +Another (simpler) way to enable link-time optimization is: @smallexample gcc -o myprog -flto -O2 foo.c bar.c @end smallexample -The above will generate bytecode for @file{foo.c} and @file{bar.c}, -merge them together into a single GIMPLE representation and optimize +The above generates bytecode for @file{foo.c} and @file{bar.c}, +merges them together into a single GIMPLE representation and optimizes them as usual to produce @file{myprog}. The only important thing to keep in mind is that to enable link-time @@ -7800,30 +7800,22 @@ compile and the link commands. To make whole program optimization effective, it is necessary to make certain whole program assumptions. The compiler needs to know what functions and variables can be accessed by libraries and runtime -outside of the link time optimized unit. When supported by the linker, -the linker plugin (see @option{-fuse-linker-plugin}) passes to the -compiler information about used and externally visible symbols. When +outside of the link-time optimized unit. When supported by the linker, +the linker plugin (see @option{-fuse-linker-plugin}) passes information +to the compiler about used and externally visible symbols. When the linker plugin is not available, @option{-fwhole-program} should be -used to allow the compiler to make these assumptions, which will lead +used to allow the compiler to make these assumptions, which leads to more aggressive optimization decisions. Note that when a file is compiled with @option{-flto}, the generated -object file will be larger than a regular object file because it will -contain GIMPLE bytecodes and the usual final code. This means that -object files with LTO information can be linked as a normal object -file. So, in the previous example, if the final link is done with - -@smallexample -gcc -o myprog foo.o bar.o -@end smallexample - -The only difference will be that no inter-procedural optimizations -will be applied to produce @file{myprog}. The two object files -@file{foo.o} and @file{bar.o} will be simply sent to the regular -linker. +object file is larger than a regular object file because it +contains GIMPLE bytecodes and the usual final code. This means that +object files with LTO information can be linked as normal object +files; if @option{-flto} is not passed to the linker, no +interprocedural optimizations are applied. Additionally, the optimization flags used to compile individual files -are not necessarily related to those used at link-time. For instance, +are not necessarily related to those used at link time. For instance, @smallexample gcc -c -O0 -flto
[Patch,AVR]: Fix loading ZERO_REG with 0 in ISR prologue
This is fix for ISR prologue that "cleared" zero reg with mov __zero_reg__,__zero_reg__ The right way is clr __zero_reg__ of course. As CLR does change cc0 notice_update_cc needs to be adapted. Passes testsuite. Moreover, lightly tested on ISR source (there is no ISR test case in test suite). Ok for trunk? Johann * config/avr/avr.md (cc): Add alternative "ldi". (movqi_insn): Use it in cc attribute. * config/avr/avr.c (notice_update_cc): Handle CC_LDI. (output_reload_in_const): Use CLR to move 0 to ZERO_REG. (output_reload_insisf): Use ZERO_REG to pre-clear register. Index: config/avr/avr.md === --- config/avr/avr.md (revision 182796) +++ config/avr/avr.md (working copy) @@ -95,7 +95,7 @@ (define_c_enum "unspecv" ;; Condition code settings. (define_attr "cc" "none,set_czn,set_zn,set_n,compare,clobber, - out_plus, out_plus_noclobber" + out_plus, out_plus_noclobber,ldi" (const_string "none")) (define_attr "type" "branch,branch1,arith,xcall" @@ -584,7 +584,7 @@ (define_insn "movqi_insn" } [(set_attr "length" "1,1,5,5,1,1,4") (set_attr "adjust_len" "mov8") - (set_attr "cc" "none,none,clobber,clobber,none,none,clobber")]) + (set_attr "cc" "ldi,none,clobber,clobber,none,none,clobber")]) ;; This is used in peephole2 to optimize loading immediate constants ;; if a scratch register from LD_REGS happens to be available. Index: config/avr/avr.c === --- config/avr/avr.c (revision 182796) +++ config/avr/avr.c (working copy) @@ -1994,6 +1994,7 @@ notice_update_cc (rtx body ATTRIBUTE_UNU case CC_OUT_PLUS: case CC_OUT_PLUS_NOCLOBBER: +case CC_LDI: { rtx *op = recog_data.operand; int len_dummy, icc; @@ -2001,16 +2002,36 @@ notice_update_cc (rtx body ATTRIBUTE_UNU /* Extract insn's operands. */ extract_constrain_insn_cached (insn); -if (CC_OUT_PLUS == cc) - avr_out_plus (op, &len_dummy, &icc); -else - avr_out_plus_noclobber (op, &len_dummy, &icc); - -cc = (enum attr_cc) icc; - +switch (cc) + { + default: +gcc_unreachable(); + + case CC_OUT_PLUS: +avr_out_plus (op, &len_dummy, &icc); +cc = (enum attr_cc) icc; +break; + + case CC_OUT_PLUS_NOCLOBBER: +avr_out_plus_noclobber (op, &len_dummy, &icc); +cc = (enum attr_cc) icc; +break; + + case CC_LDI: + +cc = (op[1] == CONST0_RTX (GET_MODE (op[0])) + && reg_overlap_mentioned_p (op[0], zero_reg_rtx)) + /* Loading zero-reg with 0 uses CLI and thus clobbers cc0. */ + ? CC_CLOBBER + /* Any other "r,rL" combination does not alter cc0. */ + : CC_NONE; + +break; + } /* inner switch */ + break; } -} +} /* outer swicth */ switch (cc) { @@ -8945,9 +8966,9 @@ avr_regno_mode_code_ok_for_base_p (int r The effect on cc0 is as follows: - Load 0 to any register : NONE - Load ld register with any value : NONE - Anything else: : CLOBBER */ + Load 0 to any register except ZERO_REG : NONE + Load ld register with any value: NONE + Anything else: : CLOBBER */ static void output_reload_in_const (rtx *op, rtx clobber_reg, int *len, bool clear_p) @@ -9062,7 +9083,9 @@ output_reload_in_const (rtx *op, rtx clo if (ival[n] == 0) { if (!clear_p) -avr_asm_len (ldreg_p ? "ldi %0,0" : "mov %0,__zero_reg__", +avr_asm_len (ldreg_p ? "ldi %0,0" + : ZERO_REGNO == REGNO (xdest[n]) ? "clr %0" + : "mov %0,__zero_reg__", &xdest[n], len, 1); continue; } @@ -9224,8 +9247,8 @@ output_reload_insisf (rtx *op, rtx clobb { /* Default needs 4 CLR instructions: clear register beforehand. */ - avr_asm_len ("clr %A0" CR_TAB - "clr %B0" CR_TAB + avr_asm_len ("mov %A0,__zero_reg__" CR_TAB + "mov %B0,__zero_reg__" CR_TAB "movw %C0,%A0", &op[0], len, 3); output_reload_in_const (op, clobber_reg, len, true);
Re: [PATCH] [RFC] PR debug/49951 - jumpy stepping at end of scope in C++
Jason Merrill writes: > OK. I'd like to apply this patch to 4.5 and 4.6 as I find the jumpy stepping behaviour very annoying while using 4.6 myself. I have bootstrapped and tested the patch on these branches for x86_64-unknown-linux-gnu successfully. Would this be OK? For the record, here is the commit I have tested on these branches. commit 399ff82d2ff78acb066b91e5940e815fcdc82ed7 Author: dodji Date: Tue Dec 20 13:36:04 2011 + PR debug/49951 - jumpy stepping at end of scope in C++ gcc/cp/ PR debug/49951 * decl.c (cxx_maybe_build_cleanup): Don't set location of the call to the destructor. gcc/testsuite/ PR debug/49951 * g++.dg/gcov/gcov-2.C: Adjust. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@182532 138bc75d-0d04-0410-961f-82ee72b054a4 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -13363,8 +13363,17 @@ cxx_maybe_build_cleanup (tree decl) cleanup = call; } + /* build_delete sets the location of the destructor call to the + current location, even though the destructor is going to be + called later, at the end of the current scope. This can lead to + a "jumpy" behaviour for users of debuggers when they step around + the end of the block. So let's unset the location of the + destructor call instead. */ + if (cleanup != NULL && EXPR_P (cleanup)) +SET_EXPR_LOCATION (cleanup, UNKNOWN_LOCATION); return cleanup; } /* When a stmt has been parsed, this function is called. */ diff --git a/gcc/testsuite/g++.dg/gcov/gcov-2.C b/gcc/testsuite/g++.dg/gcov/gcov-2.C index 6d002f5..66d8af3 100644 --- a/gcc/testsuite/g++.dg/gcov/gcov-2.C +++ b/gcc/testsuite/g++.dg/gcov/gcov-2.C @@ -20,7 +20,7 @@ private: void foo() { - C c; /* count(2) */ + C c; /* count(1) */ c.seti (1); /* count(1) */ } -- Dodji
Re: [PATCH] [RFC] PR debug/49951 - jumpy stepping at end of scope in C++
On Mon, Jan 02, 2012 at 05:41:28PM +0100, Dodji Seketeli wrote: > I'd like to apply this patch to 4.5 and 4.6 as I find the jumpy stepping > behaviour very annoying while using 4.6 myself. > > I have bootstrapped and tested the patch on these branches > for x86_64-unknown-linux-gnu successfully. > > Would this be OK? Yes. Jakub
Re: Keep static VTA locs in cselib tables only
Hi, this seem to have caused a bootstrap failure on s390x: PR51735 /home/andreas/git/gcc-head/gcc/tree-ssa-pre.c: In function ‘bool insert_aux(basic_block)’: /home/andreas/git/gcc-head/gcc/tree-ssa-pre.c:3791:1: internal compiler error: Segmentation fault Bye, -Andreas- On 11/23/2011 11:10 AM, Alexandre Oliva wrote: > This patch reduces VTA memory consumption and even speeds it up > somewhat, by avoiding recording permanent equivalences in dataflow sets > (and propagating them all the way down the control flow graph), keeping > them in cselib tables only. This saves some micro-operations, some > duplicate attempts to expand the same complex operations, and most of > all time and memory for locations in dataflow sets. > > I've also moved reverse operations and entry values, that are also > permanent equivalences, to cselib tables, introducing a mechanism to add > equivalences to cselib tables that doesn't depend on expressions hashing > equal: instead, locs for values in an equivalence set are grouped in the > loc list for the earliest (canonical) value in the set, in the cselib > tables, with a single entry in the loc list for all other set members > pointing to the canonical value. > > The downside is that we don't sort loc lists in cselib as we do in > var-trackin, so we don't give expressions the same preferences we did > before, which means there's some potential for debug info degradation, > particularly for preferring entry value expressions over concrete > expressions guaranteed to have an available value. I'm going to see > whether sorting gets us better/faster results next, but just sorting > them won't get us all the way: while before we'd sort all equivalences > for the var-tracking-canonical equivalence, we may now fail to merge > location lists because the static equivalences aren't taken into account > when dynamic equivalence sets in var-tracking dataflow sets. I haven't > thought about whether this makes much of a difference, or how to do that > efficiently if desirable, but I figured I wouldn't wait any longer > before submitting this patch for 4.7. > > This was regstrapped on i686-pc-linux-gnu and x86_64-linux-gnu. I've > also run some debug info, memory and compile-time measurements: > > - compiling stage3-gcc/ (--enable-languages=all,ada) became some 1-2% > faster on average (0.5% to 5% speedups were observed over 3 > measurements) > > - comparable speedups with a not-very-random sample of preprocessed > sources that used to be VTA bad-performers, with var-tracking memory use > down by 10% to 50%. > > - compiling stage2 target libs and stage3 host patched sources (with > both unpatched and patched stage2 compiler) produced cc1plus with 10% > fewer entry value expressions (a welcome surprise!), 1% fewer call site > value expressions, an increase of 0.1% in the total number of variables > with location lists and less than 0.5% decrease in variables with full > coverage. > > Here's the patch. Ok to install? > > > > > >
[PING] PR33919/preprocessor fix __BASE_FILE__ when included from the command line
Attached is a suggested fix for a long-standing C pre-processor bug. Ref: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33919 Ref: http://gcc.gnu.org/ml/gcc/2004-10/msg00534.html The patch implements the approach suggested by Harald van Dijk in the cited bug report. I am not familiar with the subtleties of the C pre-processor, and do not know if there may be some surprises or special cases not covered. The fix does appear to provide the expected result as demonstrated by the included test case. I have a specific question re: this new code. + name = _cpp_get_file_name (pfile->main_file); + if (!name) + name = ""; I wasn't sure whether 'name' can have a NULL value, and handled that case as shown above. Would a gcc_assert() be more appropriate, or is it safe to simply assume that the name value is not NULL? Please review. Thanks, - Gary Index: gcc/testsuite/gcc.dg/pr33919-2.h === --- gcc/testsuite/gcc.dg/pr33919-2.h(revision 0) +++ gcc/testsuite/gcc.dg/pr33919-2.h(revision 0) @@ -0,0 +1 @@ +char *nested_inc_base_file = __BASE_FILE__; Index: gcc/testsuite/gcc.dg/pr33919.c === --- gcc/testsuite/gcc.dg/pr33919.c (revision 0) +++ gcc/testsuite/gcc.dg/pr33919.c (revision 0) @@ -0,0 +1,35 @@ +/* PR preprocessor/pr33919 */ +/* { dg-do run } */ +/* { dg-options "-I . -include ${srcdir}/gcc.dg/pr33919-0.h" } */ + +#include "pr33919-1.h" + +typedef __SIZE_TYPE__ size_t; + +const char *base_file = __BASE_FILE__; + +extern int strcmp (const char *, const char *); +extern size_t strlen (const char *); +extern void abort (void); + +#define BASE_NAME "pr33919.c" + +int +main () +{ + size_t file_len = strlen (__FILE__); + size_t basename_len = strlen (BASE_NAME); + if (file_len < basename_len) +abort (); + if (strcmp (__FILE__ + file_len - basename_len, BASE_NAME)) +abort (); + if (strcmp (pre_inc_base_file, __FILE__)) +abort (); + if (strcmp (base_file, __FILE__)) +abort (); + if (strcmp (inc_base_file, __FILE__)) +abort (); + if (strcmp (nested_inc_base_file, __FILE__)) +abort (); + return 0; +} Index: gcc/testsuite/gcc.dg/pr33919-0.h === --- gcc/testsuite/gcc.dg/pr33919-0.h(revision 0) +++ gcc/testsuite/gcc.dg/pr33919-0.h(revision 0) @@ -0,0 +1 @@ +char *pre_inc_base_file = __BASE_FILE__; Index: gcc/testsuite/gcc.dg/pr33919-1.h === --- gcc/testsuite/gcc.dg/pr33919-1.h(revision 0) +++ gcc/testsuite/gcc.dg/pr33919-1.h(revision 0) @@ -0,0 +1,2 @@ +#include "pr33919-2.h" +char *inc_base_file = __BASE_FILE__; Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 182805) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,11 @@ +2012-01-02 Gary Funck + + PR preprocessor/33919 + * gcc.dg/pr33919.c: New test. + * gcc.dg/pr33919-0.h: New test header file. + * gcc.dg/pr33919-1.h: Ditto. + * gcc.dg/pr33919-2.h: Ditto. + 2012-01-02 Paolo Carlini PR c++/20140 Index: libcpp/macro.c === --- libcpp/macro.c (revision 182805) +++ libcpp/macro.c (working copy) @@ -1,7 +1,7 @@ /* Part of CPP library. (Macro and #define handling.) Copyright (C) 1986, 1987, 1989, 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, - 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. + 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc. Written by Per Bothner, 1994. Based on CCCP program by Paul Rubin, June 1986 Adapted to ANSI C, Richard Stallman, Jan 1987 @@ -278,10 +278,9 @@ _cpp_builtin_macro_text (cpp_reader *pfi pfile->line_table->highest_line); else { - map = linemap_lookup (pfile->line_table, pfile->line_table->highest_line); - while (! MAIN_FILE_P (map)) - map = INCLUDED_FROM (pfile->line_table, map); - name = ORDINARY_MAP_FILE_NAME (map); + name = _cpp_get_file_name (pfile->main_file); + if (!name) + name = ""; } len = strlen (name); buf = _cpp_unaligned_alloc (pfile, len * 2 + 3); Index: libcpp/files.c === --- libcpp/files.c (revision 182805) +++ libcpp/files.c (working copy) @@ -1,7 +1,7 @@ /* Part of CPP library. File handling. Copyright (C) 1986, 1987, 1989, 1992, 1993, 1994, 1995, 1998, - 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 - Free Software Foundation, Inc. + 1999, 2000, 2001, 2002, 2003, 2004, 2005, + 20
C++ PATCH for c++/51675 (DR 1359, constexpr union)
DR 1359 points out that the rules for constexpr constructors require all data members to be initialized, which is wrong for unions. This patch implements the obvious resolution. Tested x86_64-pc-linux-gnu, applying to trunk. commit 82927de9eb3327c2e65be4f56aec991e36c44fa9 Author: Jason Merrill Date: Mon Jan 2 00:40:02 2012 -0500 PR c++/51675 * method.c (walk_field_subobs): Don't check for uninitialized fields in a union. (synthesized_method_walk): Check here. diff --git a/gcc/cp/method.c b/gcc/cp/method.c index 8101f8a..cf2a713 100644 --- a/gcc/cp/method.c +++ b/gcc/cp/method.c @@ -1063,7 +1063,8 @@ walk_field_subobs (tree fields, tree fnname, special_function_kind sfk, /* For an implicitly-defined default constructor to be constexpr, every member must have a user-provided default constructor or an explicit initializer. */ - if (constexpr_p && !CLASS_TYPE_P (mem_type)) + if (constexpr_p && !CLASS_TYPE_P (mem_type) + && TREE_CODE (DECL_CONTEXT (field)) != UNION_TYPE) { *constexpr_p = false; if (msg) @@ -1208,12 +1209,19 @@ synthesized_method_walk (tree ctype, special_function_kind sfk, bool const_p, resolution, so a constructor can be trivial even if it would otherwise call a non-trivial constructor. */ if (expected_trivial - && !diag && (!copy_arg_p || cxx_dialect < cxx0x)) { if (constexpr_p && sfk == sfk_constructor) - *constexpr_p = trivial_default_constructor_is_constexpr (ctype); - return; + { + bool cx = trivial_default_constructor_is_constexpr (ctype); + *constexpr_p = cx; + if (diag && !cx && TREE_CODE (ctype) == UNION_TYPE) + /* A trivial constructor doesn't have any NSDMI. */ + inform (input_location, "defaulted default constructor does " + "not initialize any non-static data member"); + } + if (!diag) + return; } ++cp_unevaluated_operand; diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-union2.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-union2.C new file mode 100644 index 000..0bf2aa7 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-union2.C @@ -0,0 +1,18 @@ +// PR c++/51675 +// { dg-options -std=c++0x } + +union foo +{ + int x = 0; + short y; + + constexpr foo() = default; +}; + +union bar +{ + int x; + short y; + + constexpr bar() = default; // { dg-error "constexpr" } +};
[RFC][patch] trans-mem: mark transaction begins as returns-twice
Attached is Richard Henderson's patch that marks calls that start a transaction as returns-twice. In turn, these calls will be treated like a call to setjmp. This was motivated by the miscompilation of one of the STAMP applications (Genome), where a stack slot was used as temp storage for a CPU register but not restored when the transaction got aborted and restarted (then, after restart, the program crashed because it used inconsistent data). With the attached patch and in this particular example, the stack slots that are written to in the transaction do not get read during the transaction. (-fno-caller-saves was not a sufficient solution, BTW.) AFAICT, we previously wanted to handle "restart safety" by adding abnormal edges to all calls to the TM runtime library (which could in turn call the libitm longjmp that actually restarts a transaction). Richard mentioned that we could drop this code (I think he meant this, and others pieces perhaps) if the returns-twice approach works. BTW, did we also add abnormal edges to any call to transactional clones, which could in turn call the TM runtime library? Conceptually, what we would need to enforce is that all stack slots that are live-in into a transaction (i.e., the live range crosses a call to _ITM_beginTransaction) do not get reused until the matching call to _ITM_commitTransaction. Alternatively, we can reuse those, but need to save and restore them after aborts. As Richard suggested, I looked at the handling of REG_SETJMP (see the summary below). This looks like a sufficient solution to me (eg, don't share stack slots if regs cross a setjmp) but I don't know nearly enough about this to really understand whether it is. Can somebody else have a look please? In particular: - Is it really sufficient? - Is this too conservative (e.g., no CSE at all)? If so, we should look at this again for 4.8 I guess, otherwise every function that uses a transaction will be slower, and slowing down nontransactional code isn't nice. Or won't that be much of a slowdown? - Do we potentially get unnecessary warnings (if vars are live across a transaction begin)? I didn't get such warnings in the STAMP app that wasn't working though. Does anyone has suggestions for a test case? - For the long-term, should we try to limit the setjmp effects to the TM region? Contrary to standard setjmp/longjmp, we know about the regions and where the longjmps matching a setjmp are. Any thoughts on how much this might matter performance-wise? Overall, I think we should still use it, unless we can come up with another fix in time for 4.7. Opinions? Torvald Summary: - cprop.c: constant propagation only runs if !cfun->calls_setjmp - gcse.c: Same. - cse.c: we don't CSE over a call to setjmp. But comment associates this just with some weird longjmp on VAX and other systems. - cselib.c (cselib_process_insn): Forget everything at a setjmp. - callers of this function are: dse_step1(), local_cprop_pass(), reload_cse_regs(), thread_jump(), vt_initialize() - ira-lives.c (process_bb_node_lives): ??? Don't allocate allocnos that cross setjmps. - regstat.c (regstat_bb_compute_ri): Marks all pseudo(?) regs that cross a setjmp. Clients of this information: - (regstat_compute_ri): ??? REG_LIVE_LENGTH(regno) = -1; REG_BASIC_BLOCK(regno) = REG_BLOCK_UNKNOWN; - function.c (generate_setjmp_warnings): Warnings if vars are live across setjmp. - ira-color.c (coalesce_spill_slots): ??? Don't share stack slots if one of the regs crosses a setjmp? - reload1.c (reload_as_needed): Don't reuse any reload reg contents across a setjmp. - reload.c (find_equiv_reg): Don't reuse register contents from before a setjmp-type function call. - resource.c (mark_referenced_resources, mark_set_resources): setjmp uses all registers. - sched-deps.c: setjmp acts as a MOVE_BARRIER. What does this barrier do precisely? commit 53e0ab5b86ca38d55e4c4fd927be33143586ad15 Author: Torvald Riegel Date: Mon Dec 19 23:36:45 2011 +0100 rth's returns-twice fix diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def index 619794e..c3132cc 100644 --- a/gcc/builtin-attrs.def +++ b/gcc/builtin-attrs.def @@ -98,6 +98,7 @@ DEF_ATTR_IDENT (ATTR_STRFTIME, "strftime") DEF_ATTR_IDENT (ATTR_TYPEGENERIC, "type generic") DEF_ATTR_IDENT (ATTR_TM_REGPARM, "*tm regparm") DEF_ATTR_IDENT (ATTR_TM_TMPURE, "transaction_pure") +DEF_ATTR_IDENT (ATTR_RETURNS_TWICE, "returns_twice") DEF_ATTR_TREE_LIST (ATTR_NOVOPS_LIST, ATTR_NOVOPS, ATTR_NULL, ATTR_NULL) @@ -241,6 +242,8 @@ DEF_ATTR_TREE_LIST (ATTR_TM_NORETURN_NOTHROW_LIST, ATTR_TM_REGPARM, ATTR_NULL, ATTR_NORETURN_NOTHROW_LIST) DEF_ATTR_TREE_LIST (ATTR_TM_CONST_NOTHROW_LIST, ATTR_TM_REGPARM, ATTR_NULL, ATTR_CONST_NOTHROW_LIST) +DEF_ATTR_TREE_LIST (ATTR_TM_NOTHROW_RT_LIST, + ATTR_RETURNS_TWICE, ATTR_NULL, ATTR_TM_NOTHROW_LIST) /* Same attributes used for BUILT_IN_MA
Re: [PATCH] Fix make_relative_prefix_1 (PR driver/48306)
On Wed, Dec 21, 2011 at 6:43 AM, Jakub Jelinek wrote: > 2011-12-21 Jakub Jelinek > > * make-relative-prefix.c (make_relative_prefix_1): Avoid > stack overflow if PATH contains just a single entry and > HOST_EXECUTABLE_SUFFIX needs to be used. > > PR driver/48306 > * make-relative-prefix.c: Include sys/stat.h. > (make_relative_prefix_1): If access succeeds, check also stat > if nstore is a regular file. This is OK. Thanks. Ian
Re: [RFC][libitm] Convert to c++11 atomics
On Tue, 2011-12-20 at 01:13 +0100, Torvald Riegel wrote: > On Mon, 2011-12-19 at 15:17 -0800, Richard Henderson wrote: > > On 12/19/2011 02:58 PM, Torvald Riegel wrote: > > > In the particular case (the validated loads technique used in > > > method-gl.cc, load(), store(), and validate()), we actually do not need > > > to have loads or stores to be really atomic, but need the compiler to > > > treat them as if they were atomics wrt. to reordering etc. (e.g., wrt. > > > adjacent fences). Right now, I'm relying on the fact that GCC doesn't > > > optimize atomics yet and am just using nonatomic loads/stores. > > > > For 4.7, these accesses need to be volatile if they're to interact > > with the adjacent atomic ops. > > What do you mean by "interact"? AFAIU Andrew, currently atomics act as > full optimization barriers, leaving nonatomic memory accesses in place > somewhere between the surrounding atomic accesses. Andrew, can you please comment on this? Full context: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01427.html
Re: [Patch,AVR]: Fix loading ZERO_REG with 0 in ISR prologue
2012/1/2 Georg-Johann Lay : > This is fix for ISR prologue that "cleared" zero reg with > > mov __zero_reg__,__zero_reg__ > > The right way is > > clr __zero_reg__ > > of course. As CLR does change cc0 notice_update_cc needs to be adapted. > > Passes testsuite. Moreover, lightly tested on ISR source (there is no ISR test > case in test suite). > > Ok for trunk? > > Johann > > * config/avr/avr.md (cc): Add alternative "ldi". > (movqi_insn): Use it in cc attribute. > * config/avr/avr.c (notice_update_cc): Handle CC_LDI. > (output_reload_in_const): Use CLR to move 0 to ZERO_REG. > (output_reload_insisf): Use ZERO_REG to pre-clear register. > Please commit. Denis.
Re: [PATCH] Fix PR tree-optimization/51315
Eric Botcazou writes: >> You are basically (but non-optimally) locally re-implementing >> what expr.c:get_object_or_type_alignment does, for the >> bare MEM_REF case (you don't consider offsets that >> do not change the alignment, nor alignment information >> present on the SSA name). > > Adjusted version attached, regression-tested on SPARC/Solaris. I'll do a > full > testing cycle if it is OK for mainline. > > get_object_or_type_alignment doesn't exist on the 4.6 branch, the expanders > for > MEM_REF and TARGET_MEM_REF do: > > MAX (TYPE_ALIGN (TREE_TYPE (exp)), >get_object_alignment (exp, BIGGEST_ALIGNMENT))) > > instead, so I presume I should do the same for them in tree_non_aligned_mem_p? > > > 2011-12-07 Eric Botcazou > > PR tree-optimization/51315 > * tree.h (get_object_or_type_alignment): Declare. > * expr.c (get_object_or_type_alignment): Move to... > * builtins.c (get_object_or_type_alignment): ...here. Add assertion. > * tree-sra.c (tree_non_mode_aligned_mem_p): Rename to... > (tree_non_aligned_mem_p): ...this. Add ALIGN parameter. Look into > MEM_REFs and use get_object_or_type_alignment for them. > (build_accesses_from_assign): Adjust for above change. > (access_precludes_ipa_sra_p): Likewise. Sorry for the late notice, but this regressed memcpy-1.c for n32 and n64 on mips64-linux-gnu. I realise memcpy-1.c isn't viewed as being a proper SRA test, but in this case I think it really is showing a genuine problem. We have: struct a {int a,b,c;} a; int test(struct a a) { struct a nasty_local; __builtin_memcpy (&nasty_local,&a, sizeof(a)); return nasty_local.a; } We apply LOCAL_ALIGNMENT to nasty_local during estimated_stack_frame_size, so we have a VAR_DECL (nasty_local) with 64-bit alignment and a PARM_DECL (a) with 32-bit alignment. This fails the condition: if (STRICT_ALIGNMENT && tree_non_aligned_mem_p (rhs, get_object_alignment (lhs))) lacc->grp_unscalarizable_region = 1; because LHS has 64-bit alignment but RHS has 32-bit alignment. Richard
Re: [PATCH] Fix PR51650
On 01/02/2012 10:49 AM, Richard Guenther wrote: For the idea creating the DIE at the point we process the limbo DIE yes. But would you consider doing that unconditional or only for LTO? Unconditional. Anything that already passed the assert should be unaffected by the change. I can certainly give it a shot. Just to check, do you mean the following? Yes. Jason
[PATCH, ia64]: Fix PR 51681, ICE in gcc.dg/torture/vshuf-v2si.c (+ more)
Hello! Attached patch fixes a couple of thinkos in new ia64 vector permutation code. 2012-01-02 Uros Bizjak PR target/51681 * config/ia64/ia64.c (expand_vec_perm_shrp): Use correct operands for shrp pattern. Correctly handle and fixup shift variable. 2012-01-02 Uros Bizjak * config/ia64/ia64.c (expand_vec_perm_broadcast): Use correct operands for extzv pattern. Patch was bootstrapped and regression tested on ia64-unknown-linux-gnu, and fixes following testsuite failures: FAIL: gcc.dg/torture/vshuf-v2si.c -O2 (internal compiler error) FAIL: gcc.dg/torture/vshuf-v2si.c -O2 (test for excess errors) WARNING: gcc.dg/torture/vshuf-v2si.c -O2 compilation failed to produce executable FAIL: gcc.dg/torture/vshuf-v8qi.c -O2 execution test OK for mainline? Uros. Index: config/ia64/ia64.c === --- config/ia64/ia64.c (revision 182780) +++ config/ia64/ia64.c (working copy) @@ -11085,7 +11085,7 @@ static bool expand_vec_perm_shrp (struct expand_vec_perm_d *d) { unsigned i, nelt = d->nelt, shift, mask; - rtx tmp, op0, op1;; + rtx tmp, hi, lo; /* ??? Don't force V2SFmode into the integer registers. */ if (d->vmode == V2SFmode) @@ -11101,6 +11101,11 @@ expand_vec_perm_shrp (struct expand_vec_perm_d *d) if (d->testing_p) return true; + hi = shift < nelt ? d->op1 : d->op0; + lo = shift < nelt ? d->op0 : d->op1; + + shift %= nelt; + shift *= GET_MODE_UNIT_SIZE (d->vmode) * BITS_PER_UNIT; /* We've eliminated the shift 0 case via expand_vec_perm_identity. */ @@ -3,11 +8,9 @@ expand_vec_perm_shrp (struct expand_vec_perm_d *d) shift = 64 - shift; tmp = gen_reg_rtx (DImode); - op0 = (shift < nelt ? d->op0 : d->op1); - op1 = (shift < nelt ? d->op1 : d->op0); - op0 = gen_lowpart (DImode, op0); - op1 = gen_lowpart (DImode, op1); - emit_insn (gen_shrp (tmp, op0, op1, GEN_INT (shift))); + hi = gen_lowpart (DImode, hi); + lo = gen_lowpart (DImode, lo); + emit_insn (gen_shrp (tmp, hi, lo, GEN_INT (shift))); emit_move_insn (d->target, gen_lowpart (d->vmode, tmp)); return true; @@ -11207,7 +11210,7 @@ expand_vec_perm_broadcast (struct expand_vec_perm_ elt *= BITS_PER_UNIT; temp = gen_reg_rtx (DImode); emit_insn (gen_extzv (temp, gen_lowpart (DImode, d->op0), - GEN_INT (elt), GEN_INT (8))); + GEN_INT (8), GEN_INT (elt))); emit_insn (gen_mux1_brcst_qi (d->target, gen_lowpart (QImode, temp))); break;
[PATCH] Fix up recent bootstrap regressions in cselib (PR bootstrap/51725)
Hi! As discussed in the PR, the problem here is that when add_mem_for_addr calls new_elt_loc_list, it adds the new MEM loc to canonical_cselib_val's locs, so when mem_elt isn't canonical, we enter into addr_list as well as first_containing_mem chain some VALUE that doesn't contain any MEM locs. The check to avoid the duplicates isn't working either, because the only loc non-canonical values have is the canonical VALUE. While perhaps some more changes are needed and perhaps cselib_invalidate_mem might need to call canonical_cselib_val too, I believe this patch is correct and desirable and fixes both the ICEs I've analysed. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-01-02 Jakub Jelinek PR bootstrap/51725 * cselib.c (add_mem_for_addr): Call canonical_cselib_val on mem_elt first. --- gcc/cselib.c.jj 2012-01-01 19:54:46.0 +0100 +++ gcc/cselib.c2012-01-02 17:34:16.982499767 +0100 @@ -1264,6 +1264,8 @@ add_mem_for_addr (cselib_val *addr_elt, { struct elt_loc_list *l; + mem_elt = canonical_cselib_val (mem_elt); + /* Avoid duplicates. */ for (l = mem_elt->locs; l; l = l->next) if (MEM_P (l->loc) Jakub
Re: [PATCH, ia64]: Fix PR 51681, ICE in gcc.dg/torture/vshuf-v2si.c (+ more)
On 01/03/2012 06:47 AM, Uros Bizjak wrote: >if (d->testing_p) > return true; > > + hi = shift < nelt ? d->op1 : d->op0; > + lo = shift < nelt ? d->op0 : d->op1; > + > + shift %= nelt; This bit only works for little-endian. It's why I had that assert for shift range 1-63, for one thing. I guess an extra check for big-endian before the loop could fix that. I.e. shift = d->perm[0]; + if (BYTES_BIG_ENDIAN && shift > nelt) +return false; r~
Re: [RFC][libitm] Convert to c++11 atomics
On 12/20/2011 11:13 AM, Torvald Riegel wrote: > On Mon, 2011-12-19 at 15:17 -0800, Richard Henderson wrote: >> On 12/19/2011 02:58 PM, Torvald Riegel wrote: >>> In the particular case (the validated loads technique used in >>> method-gl.cc, load(), store(), and validate()), we actually do not need >>> to have loads or stores to be really atomic, but need the compiler to >>> treat them as if they were atomics wrt. to reordering etc. (e.g., wrt. >>> adjacent fences). Right now, I'm relying on the fact that GCC doesn't >>> optimize atomics yet and am just using nonatomic loads/stores. >> >> For 4.7, these accesses need to be volatile if they're to interact >> with the adjacent atomic ops. > > What do you mean by "interact"? AFAIU Andrew, currently atomics act as > full optimization barriers, leaving nonatomic memory accesses in place > somewhere between the surrounding atomic accesses. I'm not really certain what I was thinking here. If there are actual fences, then we'll have other optimization barriers in place. The volatility (or not) of the non-atomic operations doesn't enter into it. Sorry for the confusion. r~
Re: [PATCH] Fix up recent bootstrap regressions in cselib (PR bootstrap/51725)
On 01/03/2012 06:59 AM, Jakub Jelinek wrote: > PR bootstrap/51725 > * cselib.c (add_mem_for_addr): Call canonical_cselib_val > on mem_elt first. Ok. r~
[C++ PATCH] Add CLEANUP_POINT_EXPRs around OMP_PARALLEL/TASK/FOR if needed (PR c++/51669)
Hi! The f1 function in the testcase fails newly starting with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181332 because there is no CLEANUP_POINT_EXPR around OMP_PARALLEL/OMP_TASK/OMP_FOR whose IF/FINAL/NUM_THREADS/SCHEDULE clause expression might need some cleanups. But as the testcase shows, it isn't hard to write a testcase where even 4.6 or 4.2 ICEs too. It isn't clear where to put the CLEANUP_POINT_EXPR though, OpenMP 3.1 says just (e.g. for parallel): "The if clause expression and the num_threads clause expression are evaluated in the context outside of the parallel construct, and no ordering of those evaluations is specified. It is also unspecified whether, in what order, or how many times any side-effects of the evaluation of the num_threads or if clause expressions occur." Attached are two different patches, the first one puts the CLEANUP_POINT_EXPR around the whole OMP_PARALLEL etc. stmt, the second one wraps the individual clause expressions into CLEANUP_POINT_EXPR. Both patches have been bootstrapped/regtested on x86_64-linux and i686-linux, which one is preferrable? Jakub 2012-01-02 Jakub Jelinek PR c++/51669 * tree.h (find_omp_clause): New prototype. * tree-flow.h (find_omp_clause): Remove prototype. * c-parser.c (c_parser_omp_for_loop): Call add_stmt if c_finish_omp_for returned non-NULL. c-family/ * c-omp.c (c_parser_omp_for_loop): Don't call add_stmt here. cp/ * semantics.c (finish_omp_parallel): If there is IF or NUM_THREADS clause, call add_stmt on maybe_cleanup_point_expr result. (finish_omp_task): If there is IF or FINAL clause, call add_stmt on maybe_cleanup_point_expr result. (finish_omp_for): If there is SCHEDULE clause, call maybe_cleanup_point_expr. Call add_stmt if omp_for is non-NULL. testsuite/ * g++.dg/gomp/pr51669.C: New test. --- gcc/tree.h.jj 2011-12-16 08:37:45.0 +0100 +++ gcc/tree.h 2012-01-02 16:59:18.037923291 +0100 @@ -5866,6 +5866,9 @@ extern unsigned HOST_WIDE_INT compute_bu extern unsigned HOST_WIDE_INT highest_pow2_factor (const_tree); extern tree build_personality_function (const char *); +/* In omp-low.c. */ +extern tree find_omp_clause (tree, enum omp_clause_code); + /* In trans-mem.c. */ extern tree build_tm_abort_call (location_t, bool); extern bool is_tm_safe (const_tree); --- gcc/tree-flow.h.jj 2011-11-29 08:58:52.0 +0100 +++ gcc/tree-flow.h 2012-01-02 16:57:46.538512564 +0100 @@ -392,7 +392,6 @@ extern struct omp_region *new_omp_region struct omp_region *); extern void free_omp_regions (void); void omp_expand_local (basic_block); -extern tree find_omp_clause (tree, enum omp_clause_code); tree copy_var_decl (tree, tree, tree); /*--- --- gcc/c-parser.c.jj 2011-12-21 08:43:48.0 +0100 +++ gcc/c-parser.c 2012-01-02 17:13:17.187988709 +0100 @@ -10082,6 +10082,7 @@ c_parser_omp_for_loop (location_t loc, stmt = c_finish_omp_for (loc, declv, initv, condv, incrv, body, NULL); if (stmt) { + add_stmt (stmt); if (par_clauses != NULL) { tree *c; --- gcc/c-family/c-omp.c.jj 2011-10-12 20:28:08.0 +0200 +++ gcc/c-family/c-omp.c2012-01-02 17:12:54.706121214 +0100 @@ -576,7 +576,7 @@ c_finish_omp_for (location_t locus, tree OMP_FOR_PRE_BODY (t) = pre_body; SET_EXPR_LOCATION (t, locus); - return add_stmt (t); + return t; } } --- gcc/cp/semantics.c.jj 2012-01-01 19:54:46.0 +0100 +++ gcc/cp/semantics.c 2012-01-02 17:16:08.640980769 +0100 @@ -4384,7 +4384,7 @@ begin_omp_parallel (void) tree finish_omp_parallel (tree clauses, tree body) { - tree stmt; + tree stmt, ret; body = finish_omp_structured_block (body); @@ -4392,8 +4392,13 @@ finish_omp_parallel (tree clauses, tree TREE_TYPE (stmt) = void_type_node; OMP_PARALLEL_CLAUSES (stmt) = clauses; OMP_PARALLEL_BODY (stmt) = body; + ret = stmt; + if (find_omp_clause (clauses, OMP_CLAUSE_IF) + || find_omp_clause (clauses, OMP_CLAUSE_NUM_THREADS)) +stmt = maybe_cleanup_point_expr (stmt); + add_stmt (stmt); - return add_stmt (stmt); + return ret; } tree @@ -4406,7 +4411,7 @@ begin_omp_task (void) tree finish_omp_task (tree clauses, tree body) { - tree stmt; + tree stmt, ret; body = finish_omp_structured_block (body); @@ -4414,8 +4419,13 @@ finish_omp_task (tree clauses, tree body TREE_TYPE (stmt) = void_type_node; OMP_TASK_CLAUSES (stmt) = clauses; OMP_TASK_BODY (stmt) = body; + ret = stmt; + if (find_omp_clause (clauses, OMP_CLAUSE_IF) + || find_omp_clause (clauses, OMP_CLAUSE_FINAL)) +stmt = maybe_cleanup_point_expr (stmt); + add_stmt (stmt); - return add_stmt (stmt); + return ret; } /* Helper function for fi
[PATCH] Fix gimple_ic if adding a noreturn direct call variant to an indirect not noreturn call (PR tree-optimization/51719)
Hi! When gimple_ic is changing an indirect call into if (cond) direct_call (); else (*indirect_call) () and the indirect_call is not noreturn, but direct_call is noreturn, we ICE during checking after profile, because the noreturn call still has lhs set and edges that it shouldn't have. The following patch fixes, it, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-01-02 Jakub Jelinek PR tree-optimization/51719 * value-prof.c (gimple_ic): When indirect call isn't noreturn, but direct call is, clear direct call's lhs and don't add fallthrough edge from dcall_bb to join_bb and PHIs. * g++.dg/tree-prof/pr51719.C: New test. --- gcc/value-prof.c.jj 2011-08-18 08:36:00.0 +0200 +++ gcc/value-prof.c2012-01-02 19:01:33.760867971 +0100 @@ -1149,7 +1149,7 @@ gimple_ic (gimple icall_stmt, struct cgr tree optype = build_pointer_type (void_type_node); edge e_cd, e_ci, e_di, e_dj = NULL, e_ij; gimple_stmt_iterator gsi; - int lp_nr; + int lp_nr, dflags; cond_bb = gimple_bb (icall_stmt); gsi = gsi_for_stmt (icall_stmt); @@ -1176,6 +1176,9 @@ gimple_ic (gimple icall_stmt, struct cgr update_stmt (icall_stmt); dcall_stmt = gimple_copy (icall_stmt); gimple_call_set_fndecl (dcall_stmt, direct_call->decl); + dflags = flags_from_decl_or_type (direct_call->decl); + if ((dflags & ECF_NORETURN) != 0) +gimple_call_set_lhs (dcall_stmt, NULL_TREE); gsi_insert_before (&gsi, dcall_stmt, GSI_SAME_STMT); /* Fix CFG. */ @@ -1220,17 +1223,23 @@ gimple_ic (gimple icall_stmt, struct cgr if (e_ij != NULL) { - e_dj = make_edge (dcall_bb, join_bb, EDGE_FALLTHRU); - e_dj->probability = REG_BR_PROB_BASE; - e_dj->count = count; + if ((dflags & ECF_NORETURN) != 0) + e_ij->count = all; + else + { + e_dj = make_edge (dcall_bb, join_bb, EDGE_FALLTHRU); + e_dj->probability = REG_BR_PROB_BASE; + e_dj->count = count; + e_ij->count = all - count; + } e_ij->probability = REG_BR_PROB_BASE; - e_ij->count = all - count; } /* Insert PHI node for the call result if necessary. */ if (gimple_call_lhs (icall_stmt) - && TREE_CODE (gimple_call_lhs (icall_stmt)) == SSA_NAME) + && TREE_CODE (gimple_call_lhs (icall_stmt)) == SSA_NAME + && (dflags & ECF_NORETURN) == 0) { tree result = gimple_call_lhs (icall_stmt); gimple phi = create_phi_node (result, join_bb); --- gcc/testsuite/g++.dg/tree-prof/pr51719.C.jj 2012-01-02 19:11:27.633358952 +0100 +++ gcc/testsuite/g++.dg/tree-prof/pr51719.C2012-01-02 19:12:27.797004399 +0100 @@ -0,0 +1,27 @@ +// PR tree-optimization/51719 +// { dg-options "-O -fpartial-inlining" } + +int +bar (void) +{ + throw 1; +} + +int __attribute__ ((noinline, noclone)) +foo (int (*f) (void)) +{ + try + { +return (*f) (); + } + catch (...) + { + } + return 0; +} + +int +main () +{ + return foo (bar); +} Jakub
[C++ Patch] PR 15867
Hi, another old PR, about -Wredundant-decls (not in -Wall, nor in -Wextra, thus safe bootstrap-wise). The issue is that we are emitting a bogus warning for a declaration followed by a specialization. The fix seems easy, just check ! DECL_TEMPLATE_SPECIALIZATION on newdecl; then however, don't miss redundant declarations for the specialization itself, thus the OR with DECL_TEMPLATE_SPECIALIZATION on olddecl. The testcase checks both situations. Tested x86_64-linux. Thanks, Paolo. /cp 2011-01-02 Paolo Carlini PR c++/15867 * decl.c (duplicate_decls): With -Wredundant-decls don't warn for declaration followed by specialization. /testsuite 2011-01-02 Paolo Carlini PR c++/15867 * g++.dg/warn/Wredundant-decls-spec.C: New. Index: testsuite/g++.dg/warn/Wredundant-decls-spec.C === --- testsuite/g++.dg/warn/Wredundant-decls-spec.C (revision 0) +++ testsuite/g++.dg/warn/Wredundant-decls-spec.C (revision 0) @@ -0,0 +1,12 @@ +// PR c++/15867 +// { dg-options -Wredundant-decls } + +template struct S +{ + void foo() {} +}; + +template<> void S::foo(); + +template<> void S::foo(); // { dg-warning "previous declaration" } +template<> void S::foo(); // { dg-warning "redundant redeclaration" } Index: cp/decl.c === --- cp/decl.c (revision 182810) +++ cp/decl.c (working copy) @@ -1698,7 +1698,10 @@ duplicate_decls (tree newdecl, tree olddecl, bool /* Don't warn about extern decl followed by definition. */ && !(DECL_EXTERNAL (olddecl) && ! DECL_EXTERNAL (newdecl)) /* Don't warn about friends, let add_friend take care of it. */ - && ! (newdecl_is_friend || DECL_FRIEND_P (olddecl))) + && ! (newdecl_is_friend || DECL_FRIEND_P (olddecl)) + /* Don't warn about declaration followed by specialization. */ + && (! DECL_TEMPLATE_SPECIALIZATION (newdecl) + || DECL_TEMPLATE_SPECIALIZATION (olddecl))) { warning (OPT_Wredundant_decls, "redundant redeclaration of %qD in same scope", newdecl); warning (OPT_Wredundant_decls, "previous declaration of %q+D", olddecl);
Re: [C++ PATCH] Add CLEANUP_POINT_EXPRs around OMP_PARALLEL/TASK/FOR if needed (PR c++/51669)
On 01/03/2012 09:10 AM, Jakub Jelinek wrote: > Attached are two different patches, the first one puts the > CLEANUP_POINT_EXPR around the whole OMP_PARALLEL etc. stmt, the second > one wraps the individual clause expressions into CLEANUP_POINT_EXPR. > > Both patches have been bootstrapped/regtested on x86_64-linux and > i686-linux, which one is preferrable? I don't have a real preference. I merely note that the second patch is smaller and less complex, which would tend to lean me that direction. Jason? r~
MAINTAINERS: Add Self
Hello Everyone, Just added myself under Write After Approval. I am cut and pasting the patch below. Here is the Changelog entry: 2012-01-02 Balaji V. Iyer * MAINTAINERS (Write After Approval): Add myself. Index: MAINTAINERS === --- MAINTAINERS (revision 182820) +++ MAINTAINERS (working copy) @@ -392,6 +392,7 @@ Andrew John Hughes gnu_and...@member.fsf.org Andy Hutchinson hutchinsona...@aim.com Bernardo Innocenti ber...@develer.com +Balaji V. Iyer bvi...@gmail.com Daniel Jacobowitz d...@false.org Andreas Jaeger a...@suse.de Harsha Jagasia harsha.jaga...@amd.com Thanks, Balaji V. Iyer.
Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)
On Mon, Jan 2, 2012 at 3:30 PM, Richard Sandiford wrote: > Ayal Zaks writes: >> + for (i = 0; i < ira_pressure_classes_num; i++) >> + { >> + enum reg_class pressure_class; >> + >> + pressure_class = ira_pressure_classes[i]; >> + >> + if (max_reg_pressure[pressure_class] == 0) >> + continue; >> + >> + if (dump_file) >> + fprintf (dump_file, "%s=%d %d ", reg_class_names[pressure_class], >> + max_reg_pressure[pressure_class], >> + ira_available_class_regs[pressure_class]); >> + >> + if (max_reg_pressure[pressure_class] >> + > ira_class_hard_regs_num[pressure_class]) >> + { >> + if (dump_file) >> + fprintf (dump_file, "(pressure)\n"); >> + >> + pressure_p = true; >> >> you can "break;" now. > > FWIW, I thought the same thing at first, but I think Revital's way is better. > It's nice to know when looking at dumps whether there is excess pressure > in more than one class. This isn't performance-critical code after all. > ok >> however, you have everything setup to compute the amount of spill, so >> suggest to do the following instead: >> >> pressure += max_reg_pressure[pressure_class] >> - ira_class_hard_regs_num[pressure_class]); >> >> or better call the variable "spillage" instead of "pressure", and >> return it. Current use will only check if it's zero or positive. > > I read this suggestion in the same way as Revital seems to have done: > that we sum the pressure change over all classes. But that isn't the idea. > Using N too many registers in one pressure class cannot be mitigated by > leaving N registers in another pressure class unused. > of-course (wasn't thinking of spilling from one register file to another); meant to suggest doing: if (max_reg_pressure[pressure_class] > ira_class_hard_regs_num[pressure_class]) spillage += max_reg_pressure[pressure_class] - ira_class_hard_regs_num[pressure_class]); > TBH, the original version of this function looked better to me. > ok, it would surely suffice for now. >> Future use, as discussed offline, should compare this to the amount of >> spillage the original loop (w/o modulo-scheduling) is expected to >> have. > > I wonder if that's really worth pursuing though. For one thing, > we've no control over where the spill code is going to be inserted, > so the final ii is going to be a little arbitrary. That's less of a > problem without SMS because we're able to reschedule the code after > register allocation in order to move spills around. Indeed; that's the main motivation for the second sched pass. > > For another thing, the pressure of the original (unscheduled) code isn't > necessarily indicative of what can be achieved by the normal scheduler > with something like -fsched-pressure. We can't assume that the original > block has minimum pressure. > > If we're serious about wanting to use SMS in high-pressure loops, > perhaps we should consider trying to split live ranges in SMS itself. > I'm not sure it makes sense to leave an SMSed loop that we "know" > is going to need spill code. > > Richard Agreed. Suggest to have concrete testcases guide further development. Thanks, Ayal.
Re: [C++ PATCH] Add CLEANUP_POINT_EXPRs around OMP_PARALLEL/TASK/FOR if needed (PR c++/51669)
On 01/02/2012 05:49 PM, Richard Henderson wrote: On 01/03/2012 09:10 AM, Jakub Jelinek wrote: Attached are two different patches, the first one puts the CLEANUP_POINT_EXPR around the whole OMP_PARALLEL etc. stmt, the second one wraps the individual clause expressions into CLEANUP_POINT_EXPR. Both patches have been bootstrapped/regtested on x86_64-linux and i686-linux, which one is preferrable? I don't have a real preference. I merely note that the second patch is smaller and less complex, which would tend to lean me that direction. The second patch is also more correct, let's go with that one. Jason
Re: [C++ Patch] PR 15867
OK. Jason
Fix cross-builds broken from C++-creep
All cross-builds are "still" done as C. In C++ you don't need the missing struct qualifier or the typedef in "typedef struct gfc_expr ... gfc_expr;" (the struct declaration suffices) as there's no separate struct namespace IIUC. Doesn't this show a bug in the compatibility warning system, or is that turned off when bootstrapping as C++? Anyway, committed as obvious after a cris-elf build has passed the point of failure, which looked as follows (first five lines of errors). ... gcc -c -DIN_GCC_FRONTEND -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition -Wc++-compat -fno-common -DHAVE_CONFIG_H -I. -Ifortran -I/tmp/fail0102-96/gcc/gcc -I/tmp/fail0102-96/gcc/gcc/fortran -I/tmp/fail0102-96/gcc/gcc/../include -I/tmp/fail0102-96/gcc/gcc/../libcpp/include -I/tmp/fail0102-96/gccobj/./gmp -I/tmp/fail0102-96/gcc/gmp -I/tmp/fail0102-96/gccobj/./mpfr -I/tmp/fail0102-96/gcc/mpfr -I/tmp/fail0102-96/gcc/mpc/src -I/tmp/fail0102-96/gcc/gcc/../libdecnumber -I/tmp/fail0102-96/gcc/gcc/../libdecnumber/dpd -I../libdecnumber /tmp/fail0102-96/gcc/gcc/fortran/arith.c -o fortran/arith.o In file included from /tmp/fail0102-96/gcc/gcc/fortran/arith.c:31: /tmp/fail0102-96/gcc/gcc/fortran/gfortran.h:1702: error: expected specifier-qualifier-list before 'gfc_expr' /tmp/fail0102-96/gcc/gcc/fortran/arith.c: In function 'gfc_arith_not': /tmp/fail0102-96/gcc/gcc/fortran/arith.c:418: error: 'gfc_expr' has no member named 'value' /tmp/fail0102-96/gcc/gcc/fortran/arith.c:418: error: 'gfc_expr' has no member named 'value' /tmp/fail0102-96/gcc/gcc/fortran/arith.c: In function 'gfc_arith_and': /tmp/fail0102-96/gcc/gcc/fortran/arith.c:432: error: 'gfc_expr' has no member named 'value' /tmp/fail0102-96/gcc/gcc/fortran/arith.c:432: error: 'gfc_expr' has no member named 'value' fortran: * gfortran.h (struct gfc_expr): Add missing "struct" qualifier for member base_expr. Index: gcc/fortran/gfortran.h === --- gcc/fortran/gfortran.h (revision 182825) +++ gcc/fortran/gfortran.h (working copy) @@ -1699,7 +1699,7 @@ typedef struct gfc_expr /* Used to store the base expression in component calls, when the expression is not a variable. */ - gfc_expr *base_expr; + struct gfc_expr *base_expr; /* is_boz is true if the integer is regarded as BOZ bitpatten and is_snan denotes a signalling not-a-number. */ brgds, H-P