Re: [v3] libstdc++/50441
On Sun, 18 Sep 2011, Paolo Carlini wrote: tested x86_64-linux, committed to mainline. Hello, bugzilla seems to be down, so let me write it here: the testsuite uses #ifdef __SIZEOF_INT128__ to test for the availability of a 128 bit integer type. I haven't seen a similar define for float128, you might want to request it. In any case, a way to check for the availability of these types should be documented in the manual... -- Marc Glisse
Re: PATCH: Replace tmp with __tmp
On Sat, Sep 17, 2011 at 11:26 PM, H.J. Lu wrote: >>> Agreed. Some parets are missing, though: >>> >>> - unsigned long long tmp = (__X) ^ (__X - 1); >>> - return tmp; >>> + unsigned long long __tmp = (__X) ^ (__X - 1); >>> + return __tmp; >> >> There is none missing. This is not a macro. >> > > Here is the updated patch. Tested on Linux/x86-64. OK > for trunk? > 2011-09-17 H.J. Lu > > * config/i386/bmiintrin.h: Remove tmp. > * config/i386/tbmintrin.h: Likewise. OK. Thanks, Uros.
Re: [v3] libstdc++/50441
On 09/18/2011 09:03 AM, Marc Glisse wrote: the testsuite uses #ifdef __SIZEOF_INT128__ to test for the availability of a 128 bit integer type. I haven't seen a similar define for float128, Thanks. For now I went for a configure test, consistently for int and float, which also allows to check whether the type is the same as an existing one (otherwise we risk bad errors due to duplicate specializations, well possible right now for __float128!). Paolo.
[patch] Fix PR tree-optimization/50412
Hi, Strided accesses of single element or with gaps may require creation of epilogue loop. At the moment we don't support peeling for outer loops, therefore, we should not allow such strided accesses in outer loops. Bootstrapped and tested on powerpc64-suse-linux. Committed to trunk. Now testing for 4.6. OK for 4.6 when the testing completes? Thanks, Ira ChangeLog: PR tree-optimization/50412 * tree-vect-data-refs.c (vect_analyze_group_access): Fail for acceses that require epilogue loop if vectorizing outer loop. testsuite/ChangeLog: PR tree-optimization/50412 * gfortran.dg/vect/pr50412.f90: New. Index: tree-vect-data-refs.c === --- tree-vect-data-refs.c (revision 178939) +++ tree-vect-data-refs.c (working copy) @@ -2060,7 +2060,11 @@ vect_analyze_group_access (struct data_reference * HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step); HOST_WIDE_INT stride, last_accessed_element = 1; bool slp_impossible = false; + struct loop *loop = NULL; + if (loop_vinfo) +loop = LOOP_VINFO_LOOP (loop_vinfo); + /* For interleaving, STRIDE is STEP counted in elements, i.e., the size of the interleaving group (including gaps). */ stride = dr_step / type_size; @@ -2090,11 +2094,18 @@ vect_analyze_group_access (struct data_reference * if (loop_vinfo) { - LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = true; - if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "Data access with gaps requires scalar " "epilogue loop"); + if (loop->inner) +{ + if (vect_print_dump_info (REPORT_DETAILS)) +fprintf (vect_dump, "Peeling for outer loop is not" +" supported"); + return false; +} + + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = true; } return true; @@ -2277,10 +2288,17 @@ vect_analyze_group_access (struct data_reference * /* There is a gap in the end of the group. */ if (stride - last_accessed_element > 0 && loop_vinfo) { - LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = true; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "Data access with gaps requires scalar " "epilogue loop"); + if (loop->inner) +{ + if (vect_print_dump_info (REPORT_DETAILS)) +fprintf (vect_dump, "Peeling for outer loop is not supported"); + return false; +} + + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = true; } } Index: testsuite/gfortran.dg/vect/pr50412.f90 === --- testsuite/gfortran.dg/vect/pr50412.f90 (revision 0) +++ testsuite/gfortran.dg/vect/pr50412.f90 (revision 0) @@ -0,0 +1,12 @@ +! { dg-do compile } + + DOUBLE PRECISION AK,AI,AAE + COMMON/com/AK(36),AI(4,4),AAE(8,4),ii,jj + DO 20 II=1,4 +DO 21 JJ=1,4 + AK(n)=AK(n)-AAE(I,II)*AI(II,JJ) + 21 CONTINUE + 20 CONTINUE + END + +! { dg-final { cleanup-tree-dump "vect" } }
Re: [v3] libstdc++/50441
On Sun, 18 Sep 2011, Paolo Carlini wrote: On 09/18/2011 09:03 AM, Marc Glisse wrote: the testsuite uses #ifdef __SIZEOF_INT128__ to test for the availability of a 128 bit integer type. I haven't seen a similar define for float128, Thanks. For now I went for a configure test, consistently for int and float, which also allows to check whether the type is the same as an existing one (otherwise we risk bad errors due to duplicate specializations, well possible right now for __float128!). Indeed! The documentation is not clear on whether __int128 and __float128 may be the same types as say long long and long double, or they are different types even if they have the same size (the doc was written for C, where it doesn't matter as much). -- Marc Glisse
Re: PATCH: Replace tmp with __tmp
... probably somebody will hate me, but stylistically I also don't understand why the uppercases. Paolo.
[patch] Fix tree-optimization/50414
Hi, This patch adds a missing handling of MAX/MIN_EXPR in SLP reduction. Boostrapped and tested on powerpc64-suse-linux. Committed to trunk. Ira ChangeLog: PR tree-optimization/50414 * tree-vect-slp.c (vect_get_constant_vectors): Handle MAX_EXPR and MIN_EXPR. testsuite/ChangeLog: PR tree-optimization/50414 * gfortran.dg/vect/Ofast-pr50414.f90: New. * gfortran.dg/vect/vect.exp: Run Ofast-* tests with -Ofast. * gcc.dg/vect/no-scevccp-noreassoc-slp-reduc-7.c: New. Index: tree-vect-slp.c === --- tree-vect-slp.c (revision 178939) +++ tree-vect-slp.c (working copy) @@ -1902,6 +1902,8 @@ vect_get_constant_vectors (tree op, slp_tree slp_n bool constant_p, is_store; tree neutral_op = NULL; enum tree_code code = gimple_assign_rhs_code (stmt); + gimple def_stmt; + struct loop *loop; if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def) { @@ -1943,8 +1945,16 @@ vect_get_constant_vectors (tree op, slp_tree slp_n neutral_op = build_int_cst (TREE_TYPE (op), -1); break; + case MAX_EXPR: + case MIN_EXPR: +def_stmt = SSA_NAME_DEF_STMT (op); +loop = (gimple_bb (stmt))->loop_father; +neutral_op = PHI_ARG_DEF_FROM_EDGE (def_stmt, +loop_preheader_edge (loop)); +break; + default: - neutral_op = NULL; +neutral_op = NULL; } } @@ -1997,8 +2007,8 @@ vect_get_constant_vectors (tree op, slp_tree slp_n if (reduc_index != -1) { - struct loop *loop = (gimple_bb (stmt))->loop_father; - gimple def_stmt = SSA_NAME_DEF_STMT (op); + loop = (gimple_bb (stmt))->loop_father; + def_stmt = SSA_NAME_DEF_STMT (op); gcc_assert (loop); Index: testsuite/gfortran.dg/vect/Ofast-pr50414.f90 === --- testsuite/gfortran.dg/vect/Ofast-pr50414.f90(revision 0) +++ testsuite/gfortran.dg/vect/Ofast-pr50414.f90(revision 0) @@ -0,0 +1,11 @@ +! { dg-do compile } + + SUBROUTINE SUB (A,L,YMAX) + DIMENSION A(L) + YMA=A(1) + DO 2 I=1,L,2 +2 YMA=MAX(YMA,A(I),A(I+1)) + CALL PROUND(YMA) + END + +! { dg-final { cleanup-tree-dump "vect" } } Index: testsuite/gfortran.dg/vect/vect.exp === --- testsuite/gfortran.dg/vect/vect.exp (revision 178939) +++ testsuite/gfortran.dg/vect/vect.exp (working copy) @@ -84,6 +84,12 @@ lappend DEFAULT_VECTCFLAGS "-O3" dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/O3-*.\[fF\]{,90,95,03,08} ]] \ "" $DEFAULT_VECTCFLAGS +# With -Ofast +set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS +lappend DEFAULT_VECTCFLAGS "-Ofast" +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/Ofast-*.\[fF\]{,90,95,03,08} ]] \ +"" $DEFAULT_VECTCFLAGS + # Clean up. set dg-do-what-default ${save-dg-do-what-default} Index: testsuite/gcc.dg/vect/no-scevccp-noreassoc-slp-reduc-7.c === --- testsuite/gcc.dg/vect/no-scevccp-noreassoc-slp-reduc-7.c(revision 0) +++ testsuite/gcc.dg/vect/no-scevccp-noreassoc-slp-reduc-7.c(revision 0) @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 16 +#define MAX 121 + +unsigned int ub[N] = {0,3,6,9,12,15,18,121,24,27,113,33,36,39,42,45}; + +/* Vectorization of reduction using loop-aware SLP (with unrolling). */ + +__attribute__ ((noinline)) +int main1 (int n) +{ + int i; + unsigned int max = 50; + + for (i = 0; i < n; i++) { +max = max < ub[2*i] ? ub[2*i] : max; +max = max < ub[2*i + 1] ? ub[2*i + 1] : max; + } + + /* Check results: */ + if (max != MAX) +abort (); + + return 0; +} + +int main (void) +{ + check_vect (); + + main1 (N/2); + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_max } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_max } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ +
[patch] Fix PR testsuite/50435
Hi, This patch adds an if-statement to avoid loop vectorization and fixes underscores around restrict in gcc.dg/vect/bb-slp-25.c. Tested by Dominique on x86_64-apple-darwin10 and on x86_64-suse-linux. Committed to trunk. Ira 2011-09-18 Dominique d'Humieres Ira Rosen PR testsuite/50435 * gcc.dg/vect/bb-slp-25.c: Add an if to avoid loop vectorization. Fix underscores around restrict. Index: testsuite/gcc.dg/vect/bb-slp-25.c === --- testsuite/gcc.dg/vect/bb-slp-25.c (revision 178940) +++ testsuite/gcc.dg/vect/bb-slp-25.c (working copy) @@ -9,7 +9,7 @@ short src[N], dst[N]; -void foo (short * __restrict dst, short * __restrict src, int h, int stride) +void foo (short * __restrict__ dst, short * __restrict__ src, int h, int stride, int dummy) { int i; h /= 16; @@ -25,6 +25,8 @@ void foo (short * __restrict dst, short dst[7] += A*src[7] + src[7+stride]; dst += 8; src += 8; + if (dummy == 32) +abort (); } } @@ -41,7 +43,7 @@ int main (void) src[i] = i; } - foo (dst, src, N, 8); + foo (dst, src, N, 8, 0); for (i = 0; i < N/2; i++) {
Re: [v3] libstdc++/50441
On 09/18/2011 11:07 AM, Marc Glisse wrote: Indeed! The documentation is not clear on whether __int128 and __float128 may be the same types as say long long and long double, or they are different types even if they have the same size (the doc was written for C, where it doesn't matter as much). For sure __float80 is just long double on x86. And by the way, given the infrastructure in place, it would be easy to safely add the former too, if somebody asks for it (I think it's currently supported only for targets where it actually just boils down to long double, though) Paolo.
Re: [PATCH 6/7] Kill pedantic warnings on system headers macros
Jason Merrill writes: > On 09/16/2011 04:46 AM, Dodji Seketeli wrote: >> struct c_declspecs * >> -finish_declspecs (struct c_declspecs *specs) >> +finish_declspecs (struct c_declspecs *specs, >> + location_t where) > > Let's call this first_token_loc, too. And mention it in the function > comment. > > OK with that change. Thanks. For the record, this is the updated patch. From: Dodji Seketeli Date: Sat, 4 Dec 2010 18:35:47 +0100 Subject: [PATCH 6/7] Kill pedantic warnings on system headers macros This patch leverages the virtual location infrastructure to avoid emitting pedantic warnings related to macros defined in system headers but expanded in normal TUs. The point is to make diagnostic routines use virtual locations of tokens instead of their spelling locations. The diagnostic routines in turn indirectly use linemap_location_in_system_header_p to know if a given virtual location originated from a system header. The patch has two main parts. The libcpp part makes diagnostic routines called from the preprocessor expression parsing and number conversion code use virtual locations. The C FE part makes diagnostic routines called from the type specifiers validation code use virtual locations. This fixes the relevant examples presented in the comments of the bug but I guess, as usual, libcpp and the FEs will need on-going care to use more and more virtual locations of tokens instead of spelling locations. The combination of the patch and the previous ones boostrapped with --enable-languages=all,ada and passed regression tests on x86_64-unknown-linux-gnu. libcpp/ * include/cpplib.h (cpp_classify_number): Add a location parameter to the declaration. * expr.c (SYNTAX_ERROR_AT, SYNTAX_ERROR2_AT): New macros to emit syntax error using a virtual location. (cpp_classify_number): Add a virtual location parameter. Use SYNTAX_ERROR_AT instead of SYNTAX_ERROR, cpp_error_with_line instead of cpp_error and cpp_warning_with_line instead of cpp_warning. Pass the new virtual location parameter to those diagnostic routines. (eval_token): Add a virtual location parameter. Pass it down to cpp_classify_number. Use cpp_error_with_line instead of cpp_error, cpp_warning_with_line instead of cpp_warning, and pass the new virtual location parameter to these. (_cpp_parse_expr): Use cpp_get_token_with_location instead of cpp_get_token, to get the virtual location of the token. Use SYNTAX_ERROR2_AT instead of SYNTAX_ERROR2, cpp_error_with_line instead of cpp_error. Use the virtual location instead of the spelling location. * macro.c (maybe_adjust_loc_for_trad_cpp): Define new static function. (cpp_get_token_with_location): Use it. gcc/c-family * c-lex.c (c_lex_with_flags): Adjust to pass the virtual location to cpp_classify_number. gcc/ * c-tree.h (finish_declspecs): Add a virtual location parameter. * c-decl.c (finish_declspecs): Add a virtual location parameter. Use error_at instead of error and pass down the virtual location to pewarn and error_at. (declspecs_add_type): Use in_system_header_at instead of in_system_header. * c-parser.c (c_parser_declaration_or_fndef): Pass virtual location of the relevant token to finish_declspecs. (c_parser_struct_declaration, c_parser_parameter_declaration): Likewise. (c_parser_type_name): Likewise. gcc/testsuite/ * gcc.dg/cpp/syshdr3.h: New test header. * gcc.dg/cpp/syshdr3.c: New test file. * gcc.dg/nofixed-point-2.c: Adjust to more precise location. --- gcc/c-decl.c | 22 +++-- gcc/c-family/c-lex.c |4 +- gcc/c-parser.c | 12 ++- gcc/c-tree.h |2 +- gcc/testsuite/gcc.dg/cpp/syshdr3.c | 16 +++ gcc/testsuite/gcc.dg/cpp/syshdr3.h |7 ++ gcc/testsuite/gcc.dg/nofixed-point-2.c |6 +- libcpp/expr.c | 173 +++- libcpp/include/cpplib.h|3 +- 9 files changed, 153 insertions(+), 92 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/cpp/syshdr3.c create mode 100644 gcc/testsuite/gcc.dg/cpp/syshdr3.h diff --git a/gcc/c-decl.c b/gcc/c-decl.c index 5d4564a..cd1b276 100644 --- a/gcc/c-decl.c +++ b/gcc/c-decl.c @@ -8983,7 +8983,7 @@ declspecs_add_type (location_t loc, struct c_declspecs *specs, break; case RID_COMPLEX: dupe = specs->complex_p; - if (!flag_isoc99 && !in_system_header) + if (!flag_isoc99 && !in_system_header_at (loc)) pedwarn (loc, OPT_pedantic, "ISO C90 does not support complex types"); if (specs->typespec_word == cts_void) @@ -9508,10 +9508,12
Re: [PATCH 7/7] Reduce memory waste due to non-power-of-2 allocs
Jason Merrill writes: > On 09/17/2011 07:08 AM, Dodji Seketeli wrote: >> OK, so the patch below extracts a public ggc_alloced_size_for_request >> function from the different implementations of the ggc allocator's >> interface, and lets new_linemap use that. > > Maybe "ggc_round_alloc_size"? OK, updated the patch below accordingly. > OK with that change if nobody else has comments this week. Thanks. Below is the updated patch. From: Dodji Seketeli Date: Tue, 17 May 2011 16:48:01 +0200 Subject: [PATCH 7/7] Reduce memory waste due to non-power-of-2 allocs This patch basically arranges for the allocation size of line_map buffers to be as close as possible to a power of two. This *significantly* decreases peak memory consumption as (macro) maps are numerous and stay live during all the compilation. The patch adds a new ggc_round_alloc_size interface to the ggc allocator. In each of the two main allocator implementations of ('page' and 'zone') the function has been extracted from the main allocation function code and returns the actual size of the allocated memory region, thus giving a chance to the caller to maximize the amount of memory it actually uses from the allocated memory region. In the 'none' allocator implementation (that uses xmalloc) the ggc_round_alloc_size just returns the requested allocation size. Tested on x86_64-unknown-linux-gnu against trunk for each allocator. libcpp/ * include/line-map.h (struct line_maps::alloced_size_for_request): New member. * line-map.c (new_linemap): Use set->alloced_size_for_request to get the actual allocated size of line maps. gcc/ * ggc.h (ggc_round_alloc_size): Declare new public entry point. * ggc-none.c (ggc_round_alloc_size): New public stub function. * ggc-page.c (ggc_alloced_size_order_for_request): New static function. Factorized from ggc_internal_alloc_stat. (ggc_round_alloc_size): New public function. Uses ggc_alloced_size_order_for_request. (ggc_internal_alloc_stat): Use ggc_alloced_size_order_for_request. * ggc-zone.c (ggc_round_alloc_size): New public function extracted from ggc_internal_alloc_zone_stat. (ggc_internal_alloc_zone_stat): Use ggc_round_alloc_size. * toplev.c (general_init): Initialize line_table->alloced_size_for_request. --- gcc/ggc-none.c|9 +++ gcc/ggc-page.c| 53 +++- gcc/ggc-zone.c| 27 -- gcc/ggc.h |2 + gcc/toplev.c |1 + libcpp/include/line-map.h |8 ++ libcpp/line-map.c | 39 - 7 files changed, 114 insertions(+), 25 deletions(-) diff --git a/gcc/ggc-none.c b/gcc/ggc-none.c index 97d25b9..e57d617 100644 --- a/gcc/ggc-none.c +++ b/gcc/ggc-none.c @@ -39,6 +39,15 @@ ggc_alloc_typed_stat (enum gt_types_enum ARG_UNUSED (gte), size_t size return xmalloc (size); } +/* For a given size of memory requested for allocation, return the + actual size that is going to be allocated. */ + +size_t +ggc_round_alloc_size (size_t requested_size) +{ + return requested_size; +} + void * ggc_internal_alloc_stat (size_t size MEM_STAT_DECL) { diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c index 624f029..f919a6b 100644 --- a/gcc/ggc-page.c +++ b/gcc/ggc-page.c @@ -1054,6 +1054,47 @@ static unsigned char size_lookup[NUM_SIZE_LOOKUP] = 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9 }; +/* For a given size of memory requested for allocation, return the + actual size that is going to be allocated, as well as the size + order. */ + +static void +ggc_round_alloc_size_1 (size_t requested_size, + size_t *size_order, + size_t *alloced_size) +{ + size_t order, object_size; + + if (requested_size < NUM_SIZE_LOOKUP) +{ + order = size_lookup[requested_size]; + object_size = OBJECT_SIZE (order); +} + else +{ + order = 10; + while (requested_size > (object_size = OBJECT_SIZE (order))) +order++; +} + + if (size_order) +*size_order = order; + if (alloced_size) +*alloced_size = object_size; +} + +/* For a given size of memory requested for allocation, return the + actual size that is going to be allocated. */ + +size_t +ggc_round_alloc_size (size_t requested_size) +{ + size_t size = 0; + + ggc_round_alloc_size_1 (requested_size, NULL, &size); + return size; +} + /* Typed allocation function. Does nothing special in this collector. */ void * @@ -1072,17 +1113,7 @@ ggc_internal_alloc_stat (size_t size MEM_STAT_DECL) struct page_entry *entry; void *result; - if (size < NUM_SIZE_LOOKUP) -{ - order = size_lookup[size]; - object_size = OBJECT_SIZE (order); -} - else -{ - order = 10; - while (size > (object_size = OBJECT_SIZE (order))) - order++; -} + ggc
Re: [rs6000] Fix PR target/50091
Tested on PowerPC/Darwin by Iain and on PowerPC/Linux by me. OK for mainline and the 4.6/4.5 branches? 2011-09-06 Eric Botcazou Iain Sandoe PR target/50091 * config/rs6000/rs6000.md (probe_stack): Use explicit operand. * config/rs6000/rs6000.c (output_probe_stack_range): Likewise. Okay everywhere. Thanks, David
Re: [v3] libstdc++/50441
On Sun, 18 Sep 2011, Marc Glisse wrote: > On Sun, 18 Sep 2011, Paolo Carlini wrote: > > > On 09/18/2011 09:03 AM, Marc Glisse wrote: > > > the testsuite uses #ifdef __SIZEOF_INT128__ to test for the availability > > > of a 128 bit integer type. I haven't seen a similar define for float128, > > Thanks. For now I went for a configure test, consistently for int and float, > > which also allows to check whether the type is the same as an existing one > > (otherwise we risk bad errors due to duplicate specializations, well > > possible right now for __float128!). > > Indeed! > The documentation is not clear on whether __int128 and __float128 may be the > same types as say long long and long double, or they are different types even > if they have the same size (the doc was written for C, where it doesn't matter > as much). __int128 and unsigned __int128 are currently separate types, just like long and long long are always distinct. I'm not sure you should rely on them being distinct on any hypothetical future target where long long is 128-bit; if we added __int64 I'm not sure having it a distinct type would be the most useful implementation. __int128_t and __uint128_t are legacy typedefs for __int128 and unsigned __int128. __float128 and __float80 are typedefs. It appears (without testing) that IA64 __float80 is always a distinct type but otherwise those names will be typedefs for long double if they have the same representation and alignment as long double. -- Joseph S. Myers jos...@codesourcery.com
Re: [v3] libstdc++/50441
On 09/18/2011 08:36 PM, Joseph S. Myers wrote: __int128_t and __uint128_t are legacy typedefs for __int128 and unsigned __int128. I didn't realize this. Thus I guess, for 50441 and also for 40856 (which I'm about to do) better doing everything in terms of __int128 and unsigned __int128. Paolo.
Re: [v3] libstdc++/50441
On 09/18/2011 08:58 PM, Paolo Carlini wrote: On 09/18/2011 08:36 PM, Joseph S. Myers wrote: __int128_t and __uint128_t are legacy typedefs for __int128 and unsigned __int128. I didn't realize this. Thus I guess, for 50441 and also for 40856 (which I'm about to do) better doing everything in terms of __int128 and unsigned __int128. I'm currently blocked by the following issue. If I try to compile, with -std=gnu++98 (the default for C++) and -pedantic-errors: template struct limits; template<> struct limits<__int128> { }; template<> struct limits { }; I get: a.cc:8:26: error: ISO C++ does not support ‘__int128’ for ‘type name’ [-pedantic] a.cc:8:10: error: redefinition of ‘struct limits<__int128>’ a.cc:5:10: error: previous definition of ‘struct limits<__int128>’ this of course does *not* happen with __int128_t and __uint128_t. Apparently I can suppress such -pedantic and -pedantic-errors issues in pragma system_header headers, but then resurface when using PCHs. Please let me know if the above is supposed to work in a gnu++98 (or gnu++0x) system header also together with any -pedantic options, or I should really use __int128_t and __uint128_t for the time being, I would be certainly ok with the latter. Thanks! Paolo.
Re: PowerPC shrink-wrap support 0 of 3
On Sat, Sep 17, 2011 at 03:26:21PM +0200, Bernd Schmidt wrote: > On 09/17/11 09:16, Alan Modra wrote: > > This patch series adds shrink-wrap support for PowerPC. The patches > > are on top of Bernd's "Initial shrink-wrapping patch": > > http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02557.html, but with the > > tm.texi patch applied to tm.texi.in. Bootstrapped and regression > > tested powerpc64-linux all langs except ada, and spec CPU2006 tested. > > The spec results were a little disappointing as I expected to see some > > gains, but my baseline was a -O3 run and I suppose most of the > > shrink-wrap opportunities were lost to inlining. > > The last posted version had a bug that crept in during the review cycle, > and which made it quite ineffective. I wasn't complaining! My disappointment really stemmed from having unrealistically high expectations. I still think this optimization is a great feature. Thanks for contributing it! -- Alan Modra Australia Development Lab, IBM
Re: [v3] libstdc++/50441
Hi again, just little more details: I'm currently blocked by the following issue. If I try to compile, with -std=gnu++98 (the default for C++) and -pedantic-errors: template struct limits; template<> struct limits<__int128> { }; template<> struct limits { }; I get: a.cc:8:26: error: ISO C++ does not support ‘__int128’ for ‘type name’ [-pedantic] a.cc:8:10: error: redefinition of ‘struct limits<__int128>’ a.cc:5:10: error: previous definition of ‘struct limits<__int128>’ If I remove the second specialization: template struct limits; template<> struct limits<__int128> { }; then it compiles just fine, with -pedantic and -pedantic-errors too. Thus, it looks like something is definitely wrong here. Well, the first and second line of the error message above - wrongly talking about __int128 instead of unsigned __int128, should also be an hint... Thanks again, Paolo.
Re: [PATCH 7/7] Reduce memory waste due to non-power-of-2 allocs
2011/9/17 Dodji Seketeli : > OK, so the patch below extracts a public ggc_alloced_size_for_request > function from the different implementations of the ggc allocator's > interface, and lets new_linemap use that. > libcpp/ > > * include/line-map.h (struct line_maps::alloced_size_for_request): > New member. > * line-map.c (new_linemap): Use set->alloced_size_for_request to > get the actual allocated size of line maps. > > gcc/ > > * ggc.h (ggc_alloced_size_for_request): Declare new public entry > point. > * ggc-none.c (ggc_alloced_size_for_request): New public stub > function. > * ggc-page.c (ggc_alloced_size_order_for_request): New static > function. Factorized from ggc_internal_alloc_stat. > (ggc_alloced_size_for_request): New public function. Uses > ggc_alloced_size_order_for_request. > (ggc_internal_alloc_stat): Use ggc_alloced_size_order_for_request. > * ggc-zone.c (ggc_alloced_size_for_request): New public function > extracted from ggc_internal_alloc_zone_stat. > (ggc_internal_alloc_zone_stat): Use ggc_alloced_size_for_request. > * toplev.c (general_init): Initialize > line_table->alloced_size_for_request. For the record, the patch is fine with me. (I cannot approve it though, but you already got the approval) Thanks, -- Laurynas
[ARM] pass "--be8" to linker when linking for M profile
Hi, Here attached the second version patch, with changes mentioned previously. Is it ok? Thanks-chengbin 2011-09-16 Cheng Bin * config/arm/bpabi.h (BE8_LINK_SPEC): Add cortex-m arch and processors. > -Original Message- > From: Richard Earnshaw > Sent: Thursday, September 15, 2011 6:46 PM > To: Bin Cheng > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [ARM] pass "--be8" to linker when linking for M profile > > On 15/09/11 03:41, Bin Cheng wrote: > > Hi, > > The linker should do endian swizzling at link-time according to "--be8" > > option. > > This patch modifies BE8_LINK_SPEC by adding cortex-m processors in > > the specs string. > > > > Since R-profile supports configurable big-endian instruction fetch, > > I didn't include it here. > > > > Is it ok? Thanks. > > > > 2011-09-15 Cheng Bin > > * config/arm/bpabi.h (BE8_LINK_SPEC): add cortex-m > > arch and processors. > > > > Thanks-chengbin= > > > > > > gcc-be8-for-m-profile.patch > > > > > > +#define BE8_LINK_SPEC \ > + " %{mbig-endian:%{march=armv7-a|mcpu=cortex-a5 \ > + |mcpu=cortex-a8|mcpu=cortex-a9|mcpu=cortex-a15 \ > + |march=armv7-m|march=armv7e-m|mcpu=cortex-m3|mcpu=cortex-m4 \ > + |march=armv6-m|mcpu=cortex-m0:%{!r:--be8}}}" > > > Please sort this so that the list is ordered alphabetically by > architecture/cpu (with architectures first). > > It might save some patch churn in the future if each element was put > on a line on its own. > > OK with that change. > > R. gcc-be8-for-m-profile-20110916.patch Description: Binary data
[arm-embedded] Backport mainline 171225
Committed Backport r171225 from mainline 2011-03-21 Rainer Orth PR bootstrap/48120: * configure.ac (pwllib): Use LIBS instead of LDFLAGS. Add -lstdc++ -lm to LIBS. * configure: Regenerate. Index: configure === --- configure (revision 171224) +++ configure (revision 171225) @@ -5725,8 +5725,8 @@ if test "x$with_ppl" != xno; then if test "x$pwllib" = x; then -saved_LDFLAGS="$LDFLAGS" -LDFLAGS="$LDFLAGS $ppllibs" +saved_LIBS="$LIBS" +LIBS="$LIBS $ppllibs -lstdc++ -lm" { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PWL_handle_timeout in -lpwl" >&5 $as_echo_n "checking for PWL_handle_timeout in -lpwl... " >&6; } if test "${ac_cv_lib_pwl_PWL_handle_timeout+set}" = set; then : @@ -5767,7 +5767,7 @@ pwllib="-lpwl" fi -LDFLAGS="$saved_LDFLAGS" +LIBS="$saved_LIBS" fi ppllibs="$ppllibs -lppl_c -lppl $pwllib -lgmpxx" Index: configure.ac === --- configure.ac(revision 171224) +++ configure.ac(revision 171225) @@ -1677,10 +1677,10 @@ if test "x$with_ppl" != xno; then if test "x$pwllib" = x; then -saved_LDFLAGS="$LDFLAGS" -LDFLAGS="$LDFLAGS $ppllibs" -AC_CHECK_LIB(pwl,PWL_handle_timeout,[pwllib="-lpwl"]) -LDFLAGS="$saved_LDFLAGS" +saved_LIBS="$LIBS" +LIBS="$LIBS $ppllibs -lstdc++ -lm" +AC_CHECK_LIB(pwl, PWL_handle_timeout, [pwllib="-lpwl"]) +LIBS="$saved_LIBS" fi ppllibs="$ppllibs -lppl_c -lppl $pwllib -lgmpxx"
[arm-embedded] Backport mainline 171096 .. 174035
Backport from mainline to arm-embedded branch r171096, r171251, r171379, r171632, r171978, r172297, r174035. Committed. 2011-09-19 chengbin Backport r174035 from mainline 2011-05-22 Tom de Vries PR middle-end/48689 * fold-const.c (fold_checksum_tree): Guard TREE_CHAIN use with CODE_CONTAINS_STRUCT (TS_COMMON). Backport r172297 from mainline 2011-04-11 Chung-Lin Tang Richard Earnshaw PR target/48250 * config/arm/arm.c (arm_legitimize_reload_address): Update cases to use sign-magnitude offsets. Reject unsupported unaligned cases. Add detailed description in comments. * config/arm/arm.md (reload_outdf): Disable for ARM mode; change condition from TARGET_32BIT to TARGET_ARM. Backport r171978 from mainline 2011-04-05 Tom de Vries PR target/43920 * config/arm/arm.h (BRANCH_COST): Set to 1 for Thumb-2 when optimizing for size. Backport r171632 from mainline 2011-03-28 Richard Sandiford * builtins.c (expand_builtin_memset_args): Use gen_int_mode instead of GEN_INT. Backport r171379 from mainline 2011-03-23 Chung-Lin Tang PR target/46934 * config/arm/arm.md (casesi): Use the gen_int_mode() function to subtract lower bound instead of GEN_INT(). Backport r171251 from mainline 2011-03-21 Daniel Jacobowitz * config/arm/unwind-arm.c (__gnu_unwind_pr_common): Correct test for barrier handlers. Backport r171096 from mainline 2011-03-17 Chung-Lin Tang PR target/43872 * config/arm/arm.c (arm_get_frame_offsets): Adjust early return condition with !cfun->calls_alloca.
RE: [arm-embedded] Simply enable GCC to support -march=armv6s-m as GAS does.
Hello, I patched arm-arches.def and re-generated arm-tables.opt using command "./genopt.sh ../arm > arm-tables.opt" in directory gcc/config/arm. Now the updated patch is as below. Is it OK to trunk? BR, Terry 2011-09-19 Terry Guo * config/arm/arm-arches.def (armv6s-m): New. * config/arm/arm-tables.opt: Regenerate. diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def index 1086233..3123426 100644 --- a/gcc/config/arm/arm-arches.def +++ b/gcc/config/arm/arm-arches.def @@ -49,6 +49,7 @@ ARM_ARCH("armv6z", arm1176jzs, 6Z, FL_CO_PROC | FL_FOR_ARCH6Z ARM_ARCH("armv6zk", arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK) ARM_ARCH("armv6t2", arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2) ARM_ARCH("armv6-m", cortexm1, 6M, FL_FOR_ARCH6M) +ARM_ARCH("armv6s-m", cortexm1, 6M, FL_FOR_ARCH6M) ARM_ARCH("armv7", cortexa8, 7, FL_CO_PROC | FL_FOR_ARCH7) ARM_ARCH("armv7-a", cortexa8, 7A, FL_CO_PROC | FL_FOR_ARCH7A) ARM_ARCH("armv7-r", cortexr4, 7R, FL_CO_PROC | FL_FOR_ARCH7R) diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index d86e376..23339c7 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -323,28 +323,31 @@ EnumValue Enum(arm_arch) String(armv6-m) Value(16) EnumValue -Enum(arm_arch) String(armv7) Value(17) +Enum(arm_arch) String(armv6s-m) Value(17) EnumValue -Enum(arm_arch) String(armv7-a) Value(18) +Enum(arm_arch) String(armv7) Value(18) EnumValue -Enum(arm_arch) String(armv7-r) Value(19) +Enum(arm_arch) String(armv7-a) Value(19) EnumValue -Enum(arm_arch) String(armv7-m) Value(20) +Enum(arm_arch) String(armv7-r) Value(20) EnumValue -Enum(arm_arch) String(armv7e-m) Value(21) +Enum(arm_arch) String(armv7-m) Value(21) EnumValue -Enum(arm_arch) String(ep9312) Value(22) +Enum(arm_arch) String(armv7e-m) Value(22) EnumValue -Enum(arm_arch) String(iwmmxt) Value(23) +Enum(arm_arch) String(ep9312) Value(23) EnumValue -Enum(arm_arch) String(iwmmxt2) Value(24) +Enum(arm_arch) String(iwmmxt) Value(24) + +EnumValue +Enum(arm_arch) String(iwmmxt2) Value(25) Enum Name(arm_fpu) Type(int)
Re: [RFC] Split -mrecip
On Sat, Sep 3, 2011 at 11:11 PM, Uros Bizjak wrote: >>> > I've decided to not use four new bits from target_flags, and instead >>> > created a new mask (recip_mask). Four bits would have fit in target >>> > bits right now, but in the future we might want to add more >>> > specialization, like modes for which the reciprocals are active. >>> > >>> > What do you think? >>> >>> These new flags looks like a nice addition, but I wonder, why we need >>> separate options to handle vector recip. A vector rsqrt or rdiv is >>> generated automatically in the same way as scalar rsqrt or rdiv is >>> generated, so IMO, -mrecip-sqrt and -mrecip-div should be enough. >> >> No, the difference does matter. Using reciprocal estimates for scalar >> divs often results in errors in benchmarks because those sometimes are >> used to feed integer conversions for either index calculations or >> printouts. The small rounding errors with the reciprocals lead to >> incorrect outputs then. Context where the div can be vectorized often >> don't have this problem (they're then used purely for calculations over >> arrays of float data). For instance spec2006 and polyhedron break with >> -mrecip purely because of the scalar reciprocals, but work with only >> vectorized ones. I.e. users really want to differ between both. > > I agree with your analysis. > >> Also, when this patch goes in I plan to submit another one that activates >> vectorized rcp/rsqrt under -ffast-math already (that's what ICC happens to >> do too). > > Great! In the past, we tried to use -mrecip with -ffast-math. IIRC, > polyhedron broke on scalar rdiv and spec2006 broke on rsqrt. Taking > into account your analysis above, using separate options and > activating vectorized ones for -ffast-math makes much sense. > >>> For the future - could rs6000 and x86 use the same compile options to >>> handle reciprocals? >> >> I'd guess so. rs6000 uses a hand-written comma-splitter, which we could >> reuse. > > Perhaps rs6000 could adopt our approach in addition to its > comma-splitter? OTOH, whatever is more convenient, I don't care that > much. I have CC'd rs6000 maintainer for his opinion. Looking at this topic again, I'd propose that x86 adopts approach from rs6000. The rs6000 approach is more extensible, and offers the same flexibility, due to "!". So, x86 could have "-mrecip=", with all, default, none, div, vec-div, divf, vec-divf, rsqrt, etc ... combinations, perhaps some day also using divd & co. Probably, rs6000 needs to extend its options with vec- prefix, to conditionally enable vector reciprocals, for the same reason x86 has to. Uros.